
Staff Site Reliability Engineer
- United Kingdom
- Permanent
- Full-time
- Lead the design, implementation, and operationalization of container infrastructure using Kubernetes (k8s), ensuring high availability, performance, and security
- Architect, build, and maintain advanced, automated CI/CD pipelines using Jenkins, ArgoCD, AWS CodeBuild/Pipeline, GitHub Actions, or similar, establishing best practices for deployment strategies (e.g., blue/green, canary)
- Drive the adoption and evangelism of Infrastructure as Code (IaC) principles using Terraform, focusing on scaling the Addepar Platform across regions with a focus on cost optimization and operational efficiency
- Develop deep application-level knowledge to proactively inform and influence infrastructure requirements and constraints for Developers, QA, and Management, including implementing sophisticated dashboards for Cost and Inventory management, performance analysis, and capacity planning
- Perform advanced monitoring and troubleshooting of our infrastructure and application stack using a wide array of logging/monitoring tools, driving root cause analysis and implementing preventative measures
- Initiate and lead collaborations with cross-functional teams to identify and resolve complex Application or infrastructure issues, serving as a technical subject matter expert
- Serve as a primary on-call responder for critical incidents, demonstrating strong problem-solving skills under pressure and contributing to post-incident reviews to improve system resilience
- Highlight team-specific activities, followed by how this role will interact with other teams and groups
- Extensive progressive experience in the SRE/DevOps/Systems Engineer field, with a track record of taking on increasing responsibility
- Expert-level understanding of Cloud Infrastructure fundamentals (AWS preferred), including advanced networking, security, and managed services
- Exceptional Programming/Scripting skills in various common languages (Python , Bash, and general Linux tools are essential; Java is a strong plus), with an emphasis on building scalable, maintainable automation and tools
- Broad and deep expertise with UNIX/BSD/Linux internals (Ubuntu preferred), including performance tuning, kernel-level debugging, and advanced system administration
- Extensive Containerization experience with k8s (KOPS, EKS, ECS preferred), including cluster management, custom resource definitions (CRDs), and advanced deployment strategies
- Demonstrable experience leading initiatives with infrastructure-as-code tools such as Terraform in complex, multi-account environments
- Proficient experience with comprehensive monitoring, logging, and alerting tools such as Prometheus, Grafana, Sentry, Sumologic, or advanced AWS cloud-native tools, with a focus on observability strategy
- Excellent interpersonal and communication skills to effectively collaborate with multi-functional teams, articulate complex technical concepts, and influence outcomes
- Demonstrable experience writing and contributing to significant systems automation tooling or open-source projects is a strong plus
- Exposure to industry practices in financial services is a plus
- Act Like an Owner - Think and operate with intention, purpose and care. Own outcomes.
- Build Together - Collaborate to unlock the best solutions. Deliver lasting value.
- Champion Our Clients - Exceed client expectations. Our clients' success is our success.
- Drive Innovation - Be bold and unconstrained in problem solving. Transform the industry.
- Embrace Learning - Engage our community to broaden our perspective. Bring a growth mindset.