
Software Engineer - Site Reliability Engineering
- London
- Permanent
- Full-time
- Automate for insight and scale: Build systems that make troubleshooting fast, safe, and scalable across thousands of Neo4j instances. From internal tools that surface clear insights to canaries that support safe rollouts, you'll focus on automation that elevates reliability engineering.
- Treat operations as a software problem: Replace tribal knowledge and ad-hoc scripts with tools and systems that codify best practices-making operations predictable, scalable, and repeatable.
- Design for resilience, learn from failure: Own and evolve the tooling and processes behind incident response. From clear alerts to blameless reviews, you'll help ensure teams respond with confidence and learn with clarity.
- Champion reliability as a product feature: Help teams define and act on SLIs and SLOs, turning reliability into a shared, data-driven priority across engineering.
- Create signals, not noise: Shape an observability stack that tells us what matters, when it matters-so we can detect issues early and resolve them quickly.
- Writing backend tools and automation in Go-our primary language-with an emphasis on sound architecture, testing, and maintainability. Strong software skills in other languages, like Python, are also welcome.
- Applying SRE practices in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering.
- Collaborating with other teams to promote SRE thinking-educating on principles like observability, ownership, and service level objectives.
- Troubleshooting large-scale, cloud-based systems with confidence and curiosity.
- Monitoring distributed systems and understanding their performance characteristics.
- Designing systems with reliability, safety, and debug-ability as first-class concerns.
- Working with observability tools like OTel Collector, Prometheus, Grafana, and Google Cloud's operations suite.
- Deploying and managing applications on Kubernetes; cluster-level administration is a plus.
- Managing infrastructure with Kustomize and Terraform-keeping it clear, modular, and easy to evolve.
- Building and maintaining CI/CD workflows-ours run on GitHub Actions.
- Participating in on-call rotations and incident response with a focus on improvement, not blame.
- Writing and contributing to postmortems that lead to meaningful, lasting changes.
- Neo4j is one of the fastest-scaling technology companies in this industry. It
- 84% of the Fortune 100 and 58% of the Fortune 500 use Neo4j. Examples include
- Countless
- Neo4j was named as a Visionary in the 2023 Gartner® Magic Quadrant™ for Cloud Database Management Systems among 19 other recognized global DBMS vendors. Neo4j was also ranked as a Strong Performer among 14 top vendors in The Forrester Wave™: Vector Databases, Q3 2024.
(relationships)
(we)-[:FOCUS_ON]-
(userSuccess)
(we)-[:THRIVE_IN]-
(:Culture {type: ['Open', 'Inclusive']})
(we)-[:ASSUME]-
(:Intent {direction:'Positive'})
(we)-[:WELCOME]-
(:Discussions {nature: 'IntellectuallyHonest'})
(we)-[:DELIVER_ON]-
(ourCommitments)Neo4j is committed to protecting and respecting your privacy. Please read the to understand how we will handle the personal data that you provide. More information at .