This job has been expired
C2C
mohd.s@twsol.com
Must Have Technical/Functional Skills: Incident Management, SRE and operations engineering, reliability architecture, Automation and observability, executive communication
Roles & Responsibilities
- Incident Manager to provide technical leadership for enterprise‑wide, high‑severity incidents, problem
investigations, and high‑risk changes, while shaping reliability strategy, governance, and operational standards across complex, distributed platforms. - Drive Incident resolution mana
gement by directing cross‑functional teams through high‑impact outages, systemic problem resolution, and large‑scale change events. - Creating scripts in ELK, Grafana, AppDynamics, COP
- Auto-executing predefined queries in ELK, Grafana, AppDynamics, COP for real-time issues
- Attaching live query outputs (metrics, logs, traces) directly to alerts/incidents
- Eliminating manual tool navigation for IM and Alert teams
- Enhancing alert systems with contextual intelligence, including metric deviations and anomaly trends, relevant log snippets and patterns, and identifying affected CIs and downstream impacts
