About this role
Job Summary Design, build, and maintain scalable, fault-tolerant data pipelines using Python, SQL, and orchestration tools. Manage package dependencies, ensure security compliance, and optimize code for production data workflows. Responsibilities • Develop modular Python code with robust error handling and logging for data pipelines • Write advanced SQL queries using joins, window functions, and optimization techniques to support data processing • Process and transform data using Pandas and integrate streaming data with Kafka producers and consumers • Design, build, and maintain Directed Acyclic Graphs (DAGs) and flows using orchestration tools such as Apache Airflow and Prefect • Ensure pipelines are idempotent, scalable, and fault-tolerant to support reliable data workflows • Implement logging, monitoring, and alerting mechanisms to maintain pipeline observability and operational health • Manage Python package installations, upgrades, and dependency resolution across development, UAT, and production environments • Maintain dependency manifests (e.g., requirements.txt) with version pinning to ensure environment consistency • Support deployments in restricted or air-gapped environments by managing package and dependency constraints • Analyze vulnerability reports from security scanning tools and remediate security issues by upgrading or replacing vulnerable libraries • Fix broken imports, deprecated APIs, and compatibility issues arising from library updates while maintaining pipeline stability • Collaborate with security teams to ensure compliance with organizational security standards and secure coding practices • Refactor legacy code in data ingestion APIs, data transformation (Pandas/SQL), model training/inference pipelines, and orchestration workflows to improve modularity, readability, and performance • Ensure backward compatibility and minimize disruption to production systems during code changes • Perform data validation and ensure schema consistency and data quality across pipeline stages • Implement unit and integration tests for data pipelines to ensure reliability before deployment • Troubleshoot pipeline failures, perform root cause analysis, and provide production support for continuous workflow improvement • Handle Kafka schema evolution and message serialization/deserialization to maintain streaming data integrity • Work effectively in regulated or high-security environments, applying security and reliability best practices Preferred competencies and qualifications • Preferably 2-3 or more years of experience in data engineering • Prior experience working with production data pipelines • Experience handling dependency conflicts, library upgrades, and refactoring in live systems • Ability to work across multiple layers including API, data processing, orchestration, and machine learning pipelines
Also in Data Science
ELLIOTT MOSS CONSULTING PTE. LTD.
ELLIOTT MOSS CONSULTING PTE. LTD.
ELLIOTT MOSS CONSULTING PTE. LTD.