Job Summary Design, build, and maintain scalable, fault-tolerant data pipelines using Python, SQL, and orchestration tools. Manage package dependencies, ensure security compliance, and optimize code for production data workflows. Responsibilities • Develop modular Python code with robust error handling and logging for data pipelines • Write advanced SQL queries using joins, window functions, and optimization techniques to support data processing • Process and transform data using Pandas and integrate streaming data with Kafka producers and consumers • Design, build, and maintain Directed Acyclic Graphs (DAGs) and flows using orchestration tools such as Apache Airflow and Prefect • Ensure pipelines are idempotent, scalable, and fault-tolerant to support reliable data workflows • Implement logging, monitoring, and alerting mechanisms to maintain pipeline observability and operational health • Manage Python package installations, upgrades, and dependency resolution across development, UAT, and production environments • Maintain dependency manifests (e.g., requirements.txt) with version pinning to ensure environment consistency • Support deployments in restricted or air-gapped environments by managing package and dependency constraints • Analyze vulnerability reports from security scanning tools and remediate security issues by upgrading or replacing vulnerable libraries • Fix broken imports, deprecated APIs, and compatibility issues arising from library updates while maintaining pipeline stability • Collaborate with security teams to ensure compliance with organizational security standards and secure coding practices • Refactor legacy code in data ingestion APIs, data transformation (Pandas/SQL), model training/inference pipelines, and orchestration workflows to improve modularity, readability, and performance • Ensure backward compatibility and minimize disruption to production systems during code changes • Perform data validation and ensure schema consistency and data quality across pipeline stages • Implement unit and integration tests for data pipelines to ensure reliability before deployment • Troubleshoot pipeline failures, perform root cause analysis, and provide production support for continuous workflow improvement • Handle Kafka schema evolution and message serialization/deserialization to maintain streaming data integrity • Work effectively in regulated or high-security environments, applying security and reliability best practices Preferred competencies and qualifications • Preferably 2-3 or more years of experience in data engineering • Prior experience working with production data pipelines • Experience handling dependency conflicts, library upgrades, and refactoring in live systems • Ability to work across multiple layers including API, data processing, orchestration, and machine learning pipelines

Data Engineer

Similar roles

Epic-MyChart Consultant

Azure Cloud Engineer

Oracle HCM Cloud Talent Lead