Azure Cloud Engineer (w/m/d) 100%
Luware
- London
- Permanent
- Full-time
- Reliability Engineering: Design and implement scalable, fault-tolerant systems that enhance the reliability, performance, and availability of services running in Microsoft Azure
- Automation & Tooling: Develop tools, scripts, and automation to reduce manual effort, improve deployment velocity, and manage cloud infrastructure at scale using Infrastructure-as-Code (e.g., Bicep, ARM, Terraform) and CI/CD pipelines (e.g., Azure DevOps)
- Observability & Monitoring: Build and maintain telemetry, monitoring, alerting, and logging solutions using Azure Monitor, Log Analytics, and Grafana to ensure service health, performance, and uptime
- Incident Response & Root Cause Analysis: Participate in on-call rotations, lead response efforts during high-impact incidents, and perform thorough post-mortems to drive learning and system improvements
- Performance & Scalability: Analyse application and infrastructure telemetry to identify bottlenecks and scalability opportunities.
- Work with engineering teams to optimize reliability and efficiency across systems
- Change & Release Engineering: Partner with software teams to implement safe deployment practices. Improve release reliability and reduce time-to-resolution through continuous delivery pipelines and release validation tooling
- Capacity Planning: Use system metrics to model and forecast capacity needs, identifying risks and proactively scaling systems to meet future demand
- Platform Resiliency Improvements: Collaborate with engineering and architecture teams to identify single points of failure, introduce redundancy, and implement self-healing patterns across the Azure environment
- Security and Compliance Support: Collaborate with security and governance teams to integrate compliance and security practices into infrastructure automation and operations (e.g., support for ISO27001, SOC2)
- Cross-team Engineering Collaboration: Work with product, security, and platform teams to advocate for and embed SRE principles such as SLOs, error budgets, and blameless post-mortems
- Hands-on experience as a Site Reliability Engineer or DevOps Engineer supporting complex services in Microsoft Azure
- Experience with CI/CD pipelines, preferably Azure DevOps
- Strong automation skills with scripting (PowerShell, Python, or Go) and Infrastructure-as-Code tools (Terraform, Bicep, ARM)
- Experience building and maintaining observability platforms using Azure Monitor, Log Analytics, and Grafana
- Strong understanding of distributed systems, failure modes, and operational concerns in cloud environments
- Deep knowledge of Azure services, including Virtual Machines, Storage Accounts, Key Vaults, Networking, App Services, Load Balancers, and Firewalls
- Experience participating in incident response, root cause analysis, and high-severity incident management
- AZ-104 certification (Microsoft Azure Administrator)
- Certifications like AZ-400 (DevOps), AZ-500 (Security), or AZ-305 (Architect).
- Experience with SIP-based communication platforms (e.g., Microsoft Teams)
- Familiarity with implementing or supporting compliance frameworks such as ISO27001 or SOC2
- Challenging, interesting work in a varied field with a high level of personal responsibility
- A dynamic, motivating working environment that leaves room for the realisation of your ideas and promotes your personal development
- Regular team events and the opportunity to work in one of the Luware offices for up to one month as part of our exchange programme