About this role
Responsibilities: • Design, implement, and operate scalable Elastic Stack (ELK) solutions for logs, metrics, traces, and events. • Own end-to-end log ingestion pipelines using Beats, Logstash, Elastic Agent, and custom integrations. • Perform log parsing, filtering, cleanup, normalization, and enrichment using Grok, conditionals, processors, ingest pipelines, and ECS standards. • Define and implement ingestion best practices for performance and reliability. • Configure and maintain Kibana dashboards, visualizations, Lens, and Canvas for operational and business observability use cases. • Experience using Elastic Observability/SIEM and Elastic APM to instrument applications, collect and correlate logs, metrics, and traces, perform performance analysis, and visualize service dependencies. • Create and manage Elastic Machine Learning jobs (anomaly detection using multi metrics, forecasting) and interpret outcomes to generate insights and alerts. • Integrate Elasticsearch with other observability tools such as: • Prometheus & Grafana (metrics collection and visualization). • SolarWinds and Dynatrace (infrastructure monitoring and APM). • Correlate logs, metrics, traces, and events across platforms to enable unified observability. • Design observability solutions that support operations, infrastructure, and application teams. • Setup kibana alert rules and write advance watcher scripts. • Leverage Elastic AI Assistant, including LLM integrations in cloud environments (especially AWS), to enhance investigation, analysis, and insights. • Manage Elasticsearch clusters, including: • Familiar in installing ELK tech stack, perform patching and upgrade. • Node roles, index lifecycle management (ILM), shard strategies, and data tiers • Security (users, roles, API keys, TLS). • Performance tuning, scaling, and troubleshooting. • Apply ELK cluster management best practices for stability, availability, and resiliency. • Monitor cluster health and proactively address capacity and performance issues. • Instrument and observe AWS workloads including: EC2, Lambda, ECS/EKS, API Gateway, RDS, S3, and other supporting services. • Integrate observability deployment (eg: logstash deployment) into DevSecOps practices. • Use automation tools where applicable for operational tasks (eg: for data extraction/cleaning/transformation, reconciliation) using scripting or programming languages (Python) where applicable. Requirements: • Strong understanding on observability concepts, eg: know what is considered as important telemetry, golden signal, how to monitor, how to derive insights, etc. • Able to propose solution that can uplift observability maturity in the organization. • Strong hands-on experience with Elasticsearch, Logstash, Kibana, and Elastic ML. • Strong know how to perform log ingestion, parsing, Grok patterns, filtering, and enrichment. • Experience managing and operating production enterprise ELK clusters. • Experience with monitoring tools such as Solarwinds, Prometheus, Slack, Grafana, Dynatrace, or similar tools. • Good understanding of AWS services (EC2, S3, Lambda, VPC, Cloudwatch) relevant to observability. • Familiarity with Rest API, AI/ML, LLMs, RAG, Graph Databases, OTEL and emerging observability intelligence concepts. • Experience on topology mapping or service dependency visualization. • Strong scripting and automation skills. • Experience with CI/CD pipelines and deployment automation for logstash pipeline deployment or dashboard/canvas deployment. • Good understanding of infra (Servers, network, storage) and application tech. stack monitoring. • Ensure observability configurations meet security and compliance requirements. • Familiarity with Erlang, Java and MQ application architecture for understanding application behavior and identifying useful observability telemetry would be an advantage. • Strong communication and stakeholder engagement skills, with the ability to translate complex telemetry data into clear, actionable insights. • Strong sense of ownership and accountability, with ahigh level of commitment to delivery, quality, and outcomes. • Certifications would be a plus: • Elastic Certified Observability Engineer /Analyst • AWS Certified Solution Associate / Architect • Redhat Ansible Automation
Also in Data Science
ALLEGIS GROUP SINGAPORE PRIVATE LIMITED
THE WINSTEDT SCHOOL PTE. LTD.
OPTIMUM SOLUTIONS (SINGAPORE) PTE LTD