
Site Reliability Engineer, Observability, London, T2
- London
- Permanent
- Full-time
- As a Site Reliability Engineer on the Cloud Monitoring Team at Apple you will be working to improve the reliability and performance of the software systems that provide visibility into the services & infrastructure that runs Apple.
- Our monitoring, alerting, and visualization platform analyzes billions of metrics per minute and comprises the central nervous system of Apple's architecture.
- You will work shoulder-to-shoulder with our engineering teams to design and build the next generation of cloud and systems monitoring infrastructure, focusing on automation, availability, performance, and above all efficiency at 'reach every user on the planet' scale.
- You will dive deep into gnarly operational issues; from the software, systems, automation, and process perspectives.
- You will understand the challenges around integrating disparate infrastructures into new facilities, processes and procedures.
- Proven experience developing production-grade software in Python, Go, or Java and strong understanding of the Linux operating system and TCP/IP suite of networking protocols
- Strong sense of ownership and integrity demonstrated through clear communication and collaboration
- Experience and confidence around incident response and incident management
- Experience/knowledge in managing and scaling distributed systems in a public, private, or hybrid cloud environment
- Bare metal management experience and experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks.
- Demonstrated ability to investigate complex systemic and latent reliability issues and collaborate cross-functionally with software and systems teams to implement sustainable solutions.
- Experience automating workflows and reducing operational toil through scalable solutions.
- Monitoring of systems and services, optimization of performance, and resource utilization.