Site Reliability Engineer - Network TeamHalter is building a large rural IoT network that connects 400,000+ devices and supports farmers who use our smart collars to monitor and care for cattle.
We're looking for a Site Reliability Engineer to help scale systems to a million animals and beyond, applying cloud-scale NRE practices across a wildly distributed, rural IoT network across multiple countries.This role focuses on ensuring uptime for hundreds of thousands of animals and farmers who rely on Halter every day.What you'll doBuild & run observability for gateways, towers, and backend / edge services (metrics, logs, tracing, alerts) with strong signal and low noise.Automate ops with golden configs, zero-touch provisioning, safe canaries / rollbacks, scheduled maintenance, and self-healing where sensible.Lead incidents end-to-end (runbooks, comms, mitigation, post-mortems) and drive fixes into code, configs, and process.Harden deploys with progressive rollouts for firmware / agent / service changes across thousands of devices and multi-region backends.Performance tuning to reduce latency, optimize OTA pipelines, and improve link reliability with back-pressure & retries.Capacity & readiness by planning headroom for spikes and growth; conduct game-days / chaos testing for failover paths (cellular, satellite, region failover).
Own runbooks & SOPs that enable field teams and on-call staff to respond quickly and consistently.Partner with Network / RF engineers on coverage / capacity changes, interference hunts, and carrier / satellite escalations.Champion observability by promoting better logs, metrics, tracing, and signal-to-noise alerting.Mentor teammates on NRE mindset, tools, and operational excellence.Who we're looking forStrong automation & scripting (Python / Go / etc.) and IaC (Terraform / Ansible / etc.).
Solid networking fundamentals (TCP / IP, routing, VPNs, firewalls) with RF awareness (LoRa / LTE / sat a plus).
Hands-on with observability stacks (Prometheus, Grafana, ELK, OpenTelemetry).
Proven incident management experience for high-availability systems.Performance tuning for latency-sensitive, unreliable-link environments.Comfortable in Linux across cloud and edge devices.Data-driven : able to turn noisy telemetry into decisions (SQL or notebooks a plus).
Pragmatic problem-solver who balances reliability, speed, and cost.Bonus : IoT / off-grid / field deployments experience.Basic L3 troubleshooting such as ping / traceroute, IP / subnetting, DNS / DHCP / NAT basics, and reading simple routes.Reading link health at a high level (RSSI / SNR for LoRa or RSRP / SINR for LTE) and identifying whether a issue is a link vs service problem.Understanding failover states (cellular, satellite), cost / perf trade-offs, and safe config rollout patterns.Topology literacy to know where to place probes and alerts within gateways, towers, and backhaul paths.Our Office First Approach We value in-person connections and believe a world-class office culture supports growth, learning, and meaningful, aligned work.
We are office first, not office only, with a high-trust culture and a dog-friendly office in Auckland and a test farm in Morrinsville.About Halter At Halter, we enable farmers to run productive and sustainable operations.
Our customers use Halter to break free from time-intensive constraints and revolutionize grazing.
We seek people who want to solve challenging problems within a high-performance culture and with backing from Tier 1 investors.Join our team If this opportunity sounds like you, please apply with a cover letter explaining why you're excited about the role and your CV.
We'll be in touch.
Feel free to check out our careers page and follow us on LinkedIn and Instagram.Why our team loves working at Halter We offer generous benefits including unlimited paid annual leave, wellness days, a $1,000 self-development budget, caregiver leave, health insurance, and an employee stock ownership plan.
This is an office-first company with a strong culture of collaboration and growth.For more information, visit our careers page.
#J-18808-Ljbffr
Network Engineer • Auckland, New Zealand