About this role
JOB DESCRIPTION Key Responsibilities & Scope of Work A. Architecture Assessment & Strategic Roadmap ● Evaluate the current data engineering framework end-to-end: medallion architecture layering, naming conventions, ingestion patterns, processing logic, security controls, and data quality mechanisms. ● Benchmark the current state against industry best practices and produce a prioritized improvement roadmap with clear effort-vs-impact trade-offs. B. Data Estate Governance ● Build and maintain a comprehensive inventory of the data estate — cataloging all source systems (onboarded and prospective) and the subject areas each covers (ingested and not yet ingested). ● Establish this inventory as a living artifact that informs onboarding decisions, coverage analysis, and platform planning. C. Standards Definition & Enforcement ● Design, integrate, or refactor naming conventions for schemas, tables, views, orchestration jobs, and pipelines — along with the migration approach for transitioning to new standards where needed. ● Define standardized ingestion and processing patterns spanning the full medallion architecture, including sub-layering strategy, format standardization (Parquet, Avro, Delta), secure PII ingestion, data normalization, technical data quality tracking, row- and column-level access controls, late-arriving dimension management, and data export workflows. ● Establish clear pattern selection criteria so engineers know which approach to apply for a given source type or use case. ● Define and operationalize the exception management process for handling justified deviations from established standards. D. Hands-On Implementation ● Build production-grade boilerplate code for each standardized pattern using the existing GCP toolchain (BigQuery, CloudSQL,Cloud Composer, Dataflow, Dataproc, Cloud Storage, Pub/Sub, and related services). ● Ensure templates are modular, well-documented, and immediately adoptable by the engineering team. E. CI/CD & Developer Experience ● Support the integration of data engineering pipelines with the CI/CD solution, aligning with the broader CI/CD modernization initiative's timeline and tooling decisions. ● Contribute to developer experience improvements that reduce friction in pipeline development, testing, and deployment. F. Knowledge Transfer & Enablement ● Author the "Source Onboarding Playbook" — a repeatable, step-by-step guide for bringing new data sources into the platform, covering initial assessment, pattern Page 3 selection, naming convention application, quality gates, access control setup, and production release. ● Mentor and upskill data engineers on the new standards, patterns, and tooling through documentation, walkthroughs, and hands-on pairing. Resource Requirements (What We're Looking For) Must-Have ● Substantial progressive experience in data engineering, data architecture, or analytics platform development, with a significant portion spent in hands-on, code-level roles — not purely advisory or managerial positions. ● Deep, demonstrable expertise in designing and operating large-scale analytical solutions (data warehouses, data lakes, lakehouses) serving enterprise-grade workloads. ● Strong hands-on proficiency with GCP data services — BigQuery, CloudSQL(Federated Query), Cloud Composer (Airflow), Dataflow (Apache Beam), Dataproc (Spark), Cloud Storage, and Pub/Sub. ● Proven track record of implementing medallion architecture (Bronze/Silver/Gold) or equivalent layered data platform patterns at scale. ● Experience defining and enforcing data engineering standards, naming conventions, and governance frameworks across multiple teams and workstreams. ● Experience with dbt, Apache Iceberg, Delta Lake, or similar transformation and open table format technologies. ● Practical experience with PII handling, data masking, tokenization, and implementing row- and column-level security in cloud data platforms. ● Strong background in CI/CD for data pipelines (Terraform, Cloud Build, GitHub Actions, dbt, or equivalent). ● A track record of building reusable templates, frameworks, and boilerplate code that engineering teams actually adopt and rely on. ● Solid understanding of data quality frameworks, data contracts, and pipeline observability. Nice-to-Have ● Experience in the logistics industry or adjacent supply chain-intensive sectors, with exposure to high-volume transactional data, shipment tracking, fleet management, or warehouse and distribution analytics. ● Familiarity with data cataloging and metadata management tools (Dataplex, Purview, Alation, or equivalent). ● GCP Professional Data Engineer certification or equivalent. # Deliverable Description 1 Current State Assessment & Gap Analysis A comprehensive evaluation of the existing data engineering framework, medallion architecture layering, and naming conventions — benchmarked against industry best practices with a prioritized improvement roadmap. 2 Data Estate Inventory A complete catalog of source systems (onboarded and not) and subject areas (ingested and not), serving as the single source of truth for coverage and onboarding decisions. 3 Naming Convention Standards & Migration Plan Integrated and standardized naming conventions for schemas, tables, views, jobs, and pipelines — with a defined migration approach for transitioning existing assets where applicable. 4 Standardized Ingestion & Processing Patterns Documented and codified patterns covering medallion sub-layering, format standards, secure PII ingestion, normalization, data quality tracking, access controls, late-arriving dimensions, and data export — each with clear application criteria. 5 Exception Management Process A formal, operationalized process for requesting, reviewing, approving, and documenting deviations from data engineering standards. 6 GCP Boilerplate Implementation Production-ready, modular boilerplate code for each standardized pattern, built on the existing GCP toolchain and ready for team adoption. 7 CI/CD Integration Support Active contribution to integrating data engineering pipelines with the CI/CD solution, aligned with the modernization initiative's timeline. 8 Source Onboarding Playbook A step-by-step, repeatable playbook for onboarding new data sources — from initial assessment through production deployment, including pattern selection, quality gates, and access control setup.
Also in Software Engineering