Reports to : Project Lead
Experience : 5+ years
Start date : 1st August 2022
Responsibilities
- Responsible for Toil Reduction, implementing identified improvement opportunities, and handling minor enhancement and non-ticketed activity.
- Define and monitor service level metrics that include Reliability metrics like MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc.
- Create rules to optimize incident response by metrics, streamlining alert flows, and collaboration and communication across squads.
- Proactively identify the issues that might disrupt the service in production
- Address incoming service requests to their support groups / Jira tool
- Create and maintain alerts
- Change validation or change planning-related requests
- Assist business stakeholders in determining SLO or adjusting threshold limits
- Demand and capacity management & make corrections to SLI / SLO threshold limits
- Gather and analyze metrics from both Infrastructure and applications to assist in bug fixing
- Engage in capacity planning & performance tuning exercises
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objective (SLO, SLI)
- Debug production issues across services and levels of the stack.
Required skills and qualification
Bachelor’s degree in computer science or other highly technical, scientific disciplineExperience in AEM, Webservices / APIsExperience in working with Public Clouds (Min 3 years experience is a must )Experience with Git or other source control systemsExperience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelinesWorking knowledge in service level definitions and identifying the KPIsWorking knowledge of the TCP / IP stack, internet routing, and load balancingExperience with distributed storage technologies like NFS, HDFS, CephExperience in Observability strategyJob Category
Onsite
Shiroyama Trust Tower 4-3-1 Toranomon, Minato-ku Japan
81#J-18808-Ljbffr