
Software Engineering Manager, Site Reliability, Cloud Incident Response
- London
- Permanent
- Full-time
- Bachelor's degree or equivalent practical experience.
- 8 years of experience with software development in one or more programming languages (e.g., Python, C, C++, Java, JavaScript).
- Master's degree or PhD in Computer Science, or a related technical field.
- Experience as a cloud customer.
- Participate in on-call rotation supporting Critical Incident Response for GCP.
- Focus on high-quality customer outcomes and continuous collaboration across GCP teams.
- Create Incident Management at Google (IMAG) training and processes for incident management life-cycle and partnering with Cloud SRE Uber Tech Leads, and Cloud Support leadership team.
- Build systems and tooling to support the team, improve visibility, detection of issues, communications to customers, stakeholders, and customer facing teams.
- Define and escalate risks in Cloud, reduce incident probabilities with strategic and pragmatic approaches as needed.