Data Pipeline Development & Operations • Design, build, and operate scalable and reliable data pipelines on the Databricks platform • Develop end-to-end data workflows from ingestion through transformation to consumption • Implement robust error handling, monitoring, and alerting mechanisms • Ensure data pipeline reliability, performance, and maintainability • Optimize pipeline performance through efficient Spark job design and cluster configuration • Manage and orchestrate complex data workflows using Databricks Jobs and workflows Legacy Code Modernization • Refactor legacy code and data pipelines to PySpark for improved performance and scalability • Migrate traditional ETL processes to modern ELT patterns on Databricks • Assess existing codebases and identify opportunities for optimization and modernization • Ensure backward compatibility and data integrity during migration processes • Document refactoring approaches and create migration playbooks • Collaborate with stakeholders to minimize disruption during code transitions Data Engineering Excellence • Implement data quality checks and validation frameworks • Design and maintain Delta Lake tables with appropriate optimization strategies • Develop reusable code libraries and frameworks for common data engineering tasks • Follow software engineering best practices including version control, testing, and CI/CD • Participate in code reviews and provide constructive feedback to team members • Troubleshoot and resolve data pipeline issues in production environments Collaboration & Knowledge Sharing • Work closely with data architects, analysts, and business stakeholders • Collaborate with Infrastructure (Infra), Applications (Apps), and Cyber teams • Share knowledge and best practices with Team NCS • Mentor junior data engineers on PySpark and Databricks technologies • Document technical solutions and maintain comprehensive documentation Essential Technical Skills • Data Engineering: Strong foundation in data engineering principles, ETL/ELT processes, and data pipeline design patterns • PySpark: Proven hands-on experience developing data pipelines using PySpark, including DataFrames API, Spark SQL, and performance optimization • Databricks Platform: Practical experience with Databricks workspace, cluster management, notebooks, and job orchestration • Workspace AI Agent: Knowledge of Databricks Workspace AI Agent capabilities and integration • Data Modelling: Experience implementing data models including dimensional modelling, data vault, or lakehouse architectures • Delta Lake: Understanding of Delta Lake features including ACID transactions, schema evolution, and optimization techniques • Python: Strong Python programming skills for data processing and automation Additional Technical Skills • SQL proficiency for data querying and transformation • Experience with cloud platforms (Azure, AWS, or GCP) • Understanding of data governance and security best practices • Knowledge of streaming data processing (Structured Streaming) • Familiarity with DevOps practices and CI/CD pipelines • Experience with version control systems (Git) • Understanding of data quality frameworks and testing methodologies Professional Experience • Minimum 8 years in data engineering or related roles • At least 2-3 years of hands-on experience with Databricks platform • Proven track record of refactoring legacy code to modern frameworks • Experience building and maintaining production data pipelines at scale • Background working across multiple data sources and formats • Experience in agile development environments Required Certifications – mandatory to have at least one • Databricks Certified Data Engineer Associate OR Databricks Certified Data Engineer Professional Additional Certifications (Preferred) • Databricks Certified Associate Developer for Apache Spark • Cloud platform certifications (Azure Data Engineer Associate, AWS Certified Data Analytics, or Google Cloud Professional Data Engineer) • Relevant data engineering or big data certifications Soft Skills • Strong problem-solving and analytical thinking abilities • Excellent communication skills to explain technical concepts clearly • Ability to work collaboratively in cross-functional teams • Self-motivated with strong attention to detail • Adaptable to changing priorities and technologies • Client-focused mindset with commitment to quality delivery

Senior Data Engineer – Databricks

Similar roles

Research Assistant (Psychologist) | Contract

Hiring: Test Technician | Fix shift | Up to $3500

Backend Engineer (.NET) / 8-Months Contract