About this role
Responsibilities • Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance • Implement comprehensive logging, alerting, and monitoring systems using Application monitoring tools • Perform regular health checks performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively • Manage incident response procedures for pipeline failures, including root cause analysis, resolution, and post-incident reviews • Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment • Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency • Maintain comprehensive documentation for operational procedures, runbooks, and troubleshooting guides • Coordinate scheduled maintenance windows and system upgrades with minimal business impact • Manage user access controls, workspace configurations, and security policies within Application environments Requirements • Degree in Computer Science or Computer Engineering • Minimum 5 years working experience in system operations compliance and management areas • Project hands-on experience specifically with AWS platform (primary requirement), cloud operations or cloud architecture • Must be cloud certified (AWS) • Proficiency in Databricks platform, including workspace management, cluster configuration, and job orchestration • Strong expertise in Apache Spark within Databricks environment, including Spark SQL, DataFrames, and RDDs • Good in-depth understanding of data warehouse concepts, data profiling, data verification and advanced analytics techniques • Strong knowledge of monitoring, incident management, and cloud cost control • Technology Stack Experience: • Databricks • AWS cloud services and architecture • IDMC (Informatica Data Management Cloud) • Tableau for data visualization • Oracle Database management • ML Ops practices within Databricks environment • STATA for statistical analysis is advantage • Amazon SageMaker integration with Databricks • DataRobot platform integration • Good interpersonal skills with the ability to work with different groups of stakeholders • Strong problem-solving skills and ability to work independently in a fast-paced environment with minimal supervision • Excellent communication skills for technical documentation and cross-team collaboration Licence no: 12C6060
Also in Data Science
A*STAR RESEARCH ENTITIES
LYNEER CORP (SINGAPORE) PTE. LTD.
RECRUIT EXPRESS PTE LTD