Full time permanent, and an on-site role
Multi-branch organisation across Asia Pacific
Working within the Group IT Technical Services team, a Site Reliability Engineer (SRE) is required to help uplift the stability of the networks and infrastructure of our environment in Asia Pacific.
Role responsibilities
This new role is responsible for responding to incidents on a day‑to‑day basis, uplifting infrastructure to increase stability, and automating checks and remediation. As part of the Technical Services Group, the SRE will be part of a follow‑the‑sun group of individuals in Auckland, Sydney and Perth to provide coverage to businesses in Asia Pacific.
System Reliability and Performance
- Design, implement, and maintain highly available and scalable systems across our Linux & Windows server fleet as well as SAN (Netapp, TrueNas), Virtualisation (Vmware & Proxmox) and backup technologies
Networking Expertise
Take ownership of networking configurations including BGP, WAN, Layer 2 VLAN and SIP network routingTroubleshoot BGP routing issues, and manage Layer 2 networking devices (switches, VLANs, etc.) to ensure optimal network performance and stabilityAutomation and Tooling
Develop and implement automation scripts and tools (e.g. using Python, PowerShell, Bash, Ansible) to streamline operational tasks, provisioning, monitoring, and incident responseIncident Management and On‑Call
Participate in an on‑call rotation, responding to and resolving critical incidents efficiently, conducting root cause analysis, and implementing preventative measuresMonitoring and Alerting
Design, implement, and maintain robust monitoring, logging, and alerting systems to proactively detect and diagnose issues across servers and network componentsCapacity Planning
Analyse system performance and growth trends to forecast future capacity needs and make recommendations for infrastructure scalingTroubleshooting and Debugging
Diagnose and resolve complex technical issues spanning operating systems (Linux & Windows), Virtualisation (Vmware, Proxmox), Containerisation (Docker), applications, networking protocols (TCP / IP, BGP), and hardwareWork closely with development, security, and other infrastructure teams to ensure seamless deployment, integration, and operation of servicesDocumentation
Create and maintain comprehensive documentation for system architecture, operational procedures, and trouble‑shooting guidesProactively identify areas for improvement in our systems, processes, and tools, and drive their implementationThe benefits
Competitive salaryLap top and mobile phoneKey role in growing multi‑national organisation – reporting to CTORequired skills and experiences
Minimum three years’ experience in a Site Reliability Engineering, DevOps, Network Engineering, or Systems Engineering roleStrong expertise in Linux system administration (e.g. Ubuntu, CentOS, RHEL), including extensive experience with shell scripting, process management, file systems, and networkingSolid experience with Windows Server administration, including PowerShell scripting, Active Directory, IIS, and general Windows troubleshootingExperience with virtualization technologies including VMware, Hyper‑V and ProxMoxIn‑depth understanding and practical experience with Networking and routing protocols such as BGP (Border Gateway Protocol), including configuration, peering, route filtering, and troubleshootingComprehensive knowledge of Layer 2 networking concepts (VLANs, STP, LACP, MAC addresses, ARP) and practical experience configuring and troubleshooting network switches (e.g. Aruba, Cisco, HP or similar)Proficiency in at least one scripting / programming language (e.g. Python, Ruby, PowerShell).Experience with configuration management tools (e.g. Ansible)Familiarity with cloud platforms (e.g. AWS, Azure, GCP) is a plusExperience with monitoring and logging tools (e.g. Nagios, Prometheus, Grafana, ELK Stack, Splunk, Datadog, New Relic)Strong problem‑solving skills and a methodical approach to troubleshootingExcellent communication and collaboration abilitiesDesirable / nice to have skills include
Bachelor level degree in computer science, engineering, or a related field, or equivalent practical experienceKnowledge of containerization technologies (Docker, Kubernetes)Familiarity with CI / CD pipelinesExperience with network automation frameworksRelevant industry certifications (e.g. CCNA, CCNP, RHCE, MCSA / MCSE)Criticality of the role
We exist to support our businesses and branchesIT related processes or failures cannot put production at risk1+1=3. Collaborating with business experts to deliver valuePlease note that initial candidate screening will be made on the basis that the first 5 required skills and experiences are reflected in your work history; and working from home is not an option.
Also, we will not be considering applications from candidates who do not have the right to reside and work in New Zealand.
If you are interested but wish to learn more about the position before you apply please contact : (contact details)
#J-18808-Ljbffr