About this role
Role Summary We are seeking skilled Machine Learning Engineers to design, develop, and optimize real-time, scalable ML and data ingestion systems using modern ML frameworks and the Hadoop ecosystem. You will collaborate closely with data scientists and engineering teams to operationalize machine learning models and build robust data and ML pipelines for our projects with our various clientele. Key Responsibilities ML System & Framework Development: • Design and develop highly scalable, real-time data systems using Hadoop ecosystem components (Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink, NiFi). • Build robust data ingestion and transformation frameworks for multi-model data (image, audio, video, unstructured documents) using Java, Spark, Python, and shell scripting (batch and real-time). • Develop full-stack applications and internal engineering tools using Python, shell scripting, and modern web frameworks (e.g., Flask, React). ML Model Operationalization: • Collaborate with data scientists to operationalize machine learning models using Cloudera Machine Learning (CML). • Deploy, monitor, and maintain ML models using Spark MLlib, XGBoost, scikit-learn, TensorFlow/Keras, and Hugging Face (NLP/NLQ/Gen AI use cases). Performance Optimization: • Optimize and tune data and ML applications on Hadoop for efficient resource utilization and high performance. Collaboration & Agile Delivery: • Work closely with cross-functional teams (data science, engineering, product) to translate business requirements into technical solutions. • Participate in Agile sprint activities, including development, testing, and deployment. • Document technical designs, deployment procedures, and operational playbooks. Mandatory Skills & Experience • Degree in Computer Science, Engineering, or related discipline. • Minimum 3 years of experience in ML engineering, data engineering, or full-stack development. • Strong hands-on experience with: • Programming: Python, Java, Scala, or C++ • ML frameworks/libraries: XGBoost, scikit-learn, TensorFlow/Keras, Hugging Face • Hadoop ecosystem: Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink, NiFi • Data ingestion and transformation (batch and real-time) • Full-stack development (Python, shell scripting, Flask, React) • Experience deploying and operationalizing ML models (preferably with CML, Spark MLlib). • Performance tuning and optimization on distributed data systems. • Strong analytical, problem-solving, and collaboration skills. Preferred/Advantageous Skills & Experience • Experience with multi-modal data processing (image, audio, video, unstructured documents). • Exposure to enterprise-scale ML or data platforms (e.g., Cloudera). • Familiarity with Agile delivery methodologies. • Experience with version control and CI/CD pipelines. • Certification in relevant ML or data engineering technologies. • Strong sense of ownership and continuous learning mindset. Technical Stack / Domain Knowledge • Programming: Python, Java, Scala, C++ • ML Frameworks & Libraries: XGBoost, scikit-learn, TensorFlow/Keras, Hugging Face, Spark MLlib • Hadoop Ecosystem: Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink, NiFi • Data Engineering: Data ingestion/transformation frameworks, shell scripting • Full-Stack Development: Python, Flask, React • ML Platforms: Cloudera Machine Learning (CML) • Performance Optimization: Distributed systems tuning, resource optimization • Collaboration: Agile, cross-functional teamwork, technical documentation
Also in Data Science