SRE/Infrastructure Engineer
InfoSum
- Basingstoke, Hampshire
- Permanent
- Full-time
- Infrastructure Design and Implementation: assist or lead in the design, deployment, and operation of the infrastructure components required to support our applications and services. This includes managed cloud infrastructure, networking, security, data storage and cloud hosted services.
- System Automation: Develop and maintain automation and tools to streamline infrastructure provisioning, configuration management, deployment, and monitoring. Implement infrastructure as code (IaC) practices using tools such as Terraform and Ansible.
- Monitoring and Alerting: Implement monitoring solutions to track the health, performance, and availability of infrastructure components and applications. Configure alerting mechanisms to notify teams of potential issues and proactively address them before they impact users.
- Incident Response and Root Cause Analysis: Participate in incident response activities to identify, troubleshoot, and resolve incidents. Communicate incident status and updates to ensure both internal and external customers are fully informed. Conduct root cause analysis to determine the underlying causes of incidents and implement preventive measures to avoid recurrence.
- Performance & Cost Optimization: Analyze system performance metrics and identify opportunities for optimization. Tune infrastructure components, optimize configurations, and implement performance enhancements to ensure optimal performance and resource utilization.
- Security and Compliance: Implement security controls, and respond to security incidents in accordance with established policies and procedures.
- Disaster Recovery and High Availability: Design and implement disaster recovery (DR) and high availability (HA) solutions to ensure business continuity and minimize downtime. Develop and test DR plans, implement failover mechanisms, and conduct periodic drills to validate readiness.
- Capacity Planning and Scaling: Monitor resource utilization trends and prepare the infrastructure to handle the predicted changes in the future
- Documentation and Knowledge Sharing: Create and maintain documentation for infrastructure configurations, procedures, and best practices. Share knowledge and expertise with team members through documentation, training sessions, and mentorship to foster a culture of learning and collaboration.
- Proficiency in scripting and automation using languages such as Go, Python and Bash.
- Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes).
- Strong understanding of networking concepts, protocols, and security principles.
- Familiarity with infrastructure as code (IaC) tools and configuration management frameworks (e.g. Terraform).
- Knowledge of monitoring and logging tools (e.g. Prometheus, Grafana, ELK Stack, AWS Cloudwatch) for infrastructure and application monitoring.
- Excellent problem-solving skills, attention to detail, and ability to work independently and collaboratively in a fast-paced environment.
- Effective communication skills, both written and verbal, with the ability to articulate technical concepts to non-Infrastructure stakeholders.
- A competitive salary based on your experience and ability to perform in role
- 25 days annual leave (excluding bank holidays)
- 8% pension contribution
- Private health care via Aviva
- Fantastic corporate discounts and mental wellbeing support via Perkbox, including a top of line EAP.
- Salary sacrifice schemes.
We are sorry but this recruiter does not accept applications from abroad.