Senior Site Reliability Consultant

EIL Global LimitedAuckland, Auckland, New Zealand

4 hours ago

Job description

Reports to : Project Lead

Experience : 5+ years

Start date : 1st August 2022

Responsibilities

Responsible for Toil Reduction, implementing identified improvement opportunities, and handling minor enhancement and non-ticketed activity.
Define and monitor service level metrics that include Reliability metrics like MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc.
Create rules to optimize incident response by metrics, streamlining alert flows, and collaboration and communication across squads.
Proactively identify the issues that might disrupt the service in production
Address incoming service requests to their support groups / Jira tool
Create and maintain alerts
Change validation or change planning-related requests
Assist business stakeholders in determining SLO or adjusting threshold limits
Demand and capacity management & make corrections to SLI / SLO threshold limits
Gather and analyze metrics from both Infrastructure and applications to assist in bug fixing
Engage in capacity planning & performance tuning exercises
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objective (SLO, SLI)
Debug production issues across services and levels of the stack.

Required skills and qualification

Bachelor’s degree in computer science or other highly technical, scientific discipline

Experience in AEM, Webservices / APIs

Experience in working with Public Clouds (Min 3 years experience is a must )

Experience with Git or other source control systems

Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines

Working knowledge in service level definitions and identifying the KPIs

Working knowledge of the TCP / IP stack, internet routing, and load balancing

Experience with distributed storage technologies like NFS, HDFS, Ceph

Experience in Observability strategy

Job Category

Onsite

Shiroyama Trust Tower 4-3-1 Toranomon, Minato-ku Japan

#J-18808-Ljbffr

Senior Consultant • Auckland, Auckland, New Zealand