Senior Observability engineer

  • IT
  • Pune
  • 7 hours ago
  • Full Time

About the job

Position Summary:

As a Senior Engineer in the Monitoring and Observability team, you will be responsible for designing, implementing, and optimizing monitoring solutions to ensure reliability and performance of Ensono distributed and application services. This role requires deep expertise in real-time monitoring, alerting, anomaly detection, and automation to proactively identify and rapid resolution of incidents. You will also be responsible for designing and implementing client solutions that Ensono manages.

What You Will Do: 

Engineer & operate scalable monitoring & observability platform for Ensono’s Hybrid Cloud clients, using current tools in Ensono fleet such as BMC Truesight, BMC Helix, Entuity, VMWare Aria.

Plan and execute strategic roadmap for observability and monitoring tools, ensuring alignment with business and clients’ requirements

Define monitoring best practices, including proactive alerting, anomaly detection and performance analytics

Operate and optimize end-to-end monitoring solutions, for real-time visibility into network, distributed systems and applications

Establish automated alerting thresholds based on Service Level Objectives (SLO) and Service Level Agreement (SLA)

Establish monitoring audit standards for conformance and compliant purposes on standard as well as custom monitors

Point of escalation for day-to-day monitoring related incidents

Automate monitoring configurations and telemetry collection using scripting and Infrastructure as a Code (IAC) tool like Ansible and Terraform

We want all new Associates to succeed in their roles at Ensono. That's why we've outlined the job requirements below. To be considered for this role, it's important that you meet all Required Qualifications. If you do not meet all of the Preferred Qualifications, we still encourage you to apply.

Required Qualifications: 

7+ years of experience in observability or monitoring engineering operational roles

7+ years of hands-on experience with ITSM platforms such as ServiceNow and Monitoring Tools such as BMC, Data Dog, Entuity, or others

Strong proficiency in Python, Bash, JavaScript for automation and scripting

Experience with Infrastructure as Code (Ansible, Terraform, etc) for observability tools deployment

Strong analytical and problem-solving skills for diagnosing complex issues

Effective communication and leadership, especially in training and cross functional team collaborations

Ability to think outside the box, holistically about processes that impact the business engagement, and continually refine processes

Ability to thrive in an independent and collaborative fast-paced environment, managing priorities properly

Bachelor’s degree in related field

Preferred Qualifications:

Master’s degree in information technology related field

Proficiency in cloud platforms (AWS, Azure, GCP) and Kubernetes deployment & monitoring

Advanced ITIL, including ITIL v3 and v4, certification or training

Experience on AI/ML integration into ITSM practices is a plus

Flexible work schedule