RECRUIT EXPRESS PTE LTD is hiring for a Network Automation & Reliability Engineer internship — a 12-month, hybrid Design role based in ORCHARD ROAD, Singapore. It is an unpaid internship. It is open to university students, typically in Year 2–4. Applicants with experience in Network Performance, Network Management, Workflow Automation, Pipelines, and Root Cause Analysis are a strong fit.
About this role
Network Automation and Engineering Excellence: • Design, develop, and maintain production-grade automation frameworks using Ansible, Python, and CI/CD pipelines. • Build reusable Ansible collections, Python libraries, and RESTAPI integrations for network and security platforms. • Implement GitOps practices for network configuration management, including version control, automated testing, and continuous deployment. • Integrate automation workflows with enterprise DevOps toolchains (e.g., GitHub, Ansible, Terraform, ITSM tools). • Develop automated validation and rollback mechanisms for network changes. • Perform hands-on development, testing, and deployment of automation workflows in production environments. • Act as the escalation point for complex network automation and reliability issues, performing deep-dive troubleshooting and root cause analysis. • Evaluate and implement emerging automation technologies to enhance scalability, reliability, and efficiency. Foundational Network Architecture and Operations: • Serve as the SME for core network domains — routing, switching, firewalls, load balancing, WAN optimization, and hybrid cloud connectivity. • Provide architectural oversight for critical network platforms including Cisco, Checkpoint, Palo Alto, F5, Zscaler, Symantec, and AWS. • Lead modernization initiatives to ensure high availability, security, and performance across the global network. • Drive capacity planning, performance optimization, and lifecycle management to maintain operational excellence. • Establish and enforce network reliability standards, ensuring minimal downtime and rapid recovery from incidents. Firewall and Security Automation: • Develop and maintain automation for firewall policy management, rule lifecycle, and compliance validation across Checkpoint, Palo Alto, Cisco, and cloud-native firewalls. • Integrate firewall automation workflows with change management and compliance systems (e.g., AlgoSec, ITSM tools). • Collaborate with Cybersecurity teams to embed security controls and compliance into network automation and operational processes. Observability, Tooling, and Reliability Engineering: • Design and implement observability pipelines for network telemetry (e.g., streaming telemetry, SNMP, NetFlow, sFlow) integrated with Grafana, Prometheus, and ELK. • Automate health checks, anomaly detection, and self-healing workflows using observability data. • Collaborate with SRE and platform teams to define and implement network SLOs, SLIs, and error budgets. • Develop automated testing frameworks for network configuration validation, compliance checks, and pre-deployment simulation (e.g., Batfish, NAPALM, pyATS). • Implement chaos and resilience testing for network automation workflows to ensure fault tolerance and recovery. Cross-Functional Collaboration and Stakeholder Management: • Collaborate with Cloud, Security, DevOps, and Application teams to ensure seamless integration of network services into enterprise platforms and workflows. • Partner with Cybersecurity and Compliance teams to ensure automation adheres to enterprise security and regulatory standards. • Manage vendor and service provider relationships, influencing technology roadmaps and ensuring alignment with client direction. • Communicate complex technical concepts to non-technical stakeholders, providing clear insights into network performance, risks, and strateclient initiatives. • Mentor and upskill network engineers in automation best practices, coding standards, and tool usage. Governance, Compliance, and Continuous Improvement: • Establish governance frameworks for network automation and reliability, ensuring compliance with internal policies, regulatory standards, and industry best practices. • Define and track key performance indicators (KPIs) such as automation coverage, change success rate, and mean time to recovery (MTTR). • Lead post-incident reviews, root cause analyses, and continuous improvement initiatives to enhance operational resilience. • Ensure comprehensive documentation of automation workflows, network configurations, and operational procedures. Interested applicants please send your resume to venessagoh@recruitexpress.com.sg Venessa Goh Wee Ni R24124686 Recruit Express Pte Ltd EA License No: 99C4599 We regret that only shortlisted candidates will be contacted.
Also in Design