
Infrastructure/ Platform Engineer Apache
- London
- Contract
- Full-time
? Package Spark workloads for deployment via Docker/Kubernetes and integrate with orchestration systems (e.g., Airflow, custom schedulers).
? Work with platform engineers to embed Spark jobs into InfoSum's platform APIs and data pipelines.
? Troubleshoot job failures, memory and resource issues, and execution anomalies across various runtime environments.
? Optimize Spark job performance and advise on best practices to reduce cloud compute and storage costs.
? Guide engineering teams on choosing the right execution strategies across AWS, GCP, and Azure.
? Provide subject matter expertise on using AWS Glue for ETL workloads and integration with S3 and other AWS-native services.
? Implement observability tooling for logs, metrics, and error handling to support monitoring and incident response.
? Align implementations with InfoSum's privacy, security, and compliance practices.
Required Skills and Experience:
? Proven experience with Apache Spark (Scala, Java, or PySpark), including performance optimization and advanced tuning techniques.
? Strong troubleshooting skills in production Spark environments, including diagnosing memory usage, shuffles, skew, and executor behavior.
? Experience deploying and managing Spark jobs in at least two major cloud environments (AWS, GCP, Azure).
? In-depth knowledge of AWS Glue, including job authoring, triggers, and cost-aware configuration.
? Familiarity with distributed data formats (Parquet, Avro), data lakes (Iceberg, Delta Lake), and cloud storage systems (S3, GCS, Azure Blob).
? Hands-on experience with Docker, Kubernetes, and CI/CD pipelines.
? Strong documentation and communication skills, with the ability to support and coach internal teams.
Key Indicators of Success:
? Spark jobs are performant, fault-tolerant, and integrated into InfoSum's platform with minimal overhead.
? Cost of running data processing workloads is optimized across cloud environments.
? Engineering teams are equipped with best practices for writing, deploying, and monitoring Spark workloads.
? Operational issues are rapidly identified and resolved, with root causes clearly documented.
? Work is delivered with a high level of independence, reliability, and professionalismAll profiles will be reviewed against the required skills and experience. Due to the high number of applications we will only be able to respond to successful applicants in the first instance. We thank you for your interest and the time taken to apply!Share this jobOwen Kent E-mailManpowerGroup (NYSE: MAN), the leading global workforce solutions company, helps organisations transform in a fast-changing world of work by sourcing, assessing, developing and managing the talent that enables them to win. We develop innovative solutions for hundreds of thousands of organisations every year, providing them with skilled talent while finding meaningful, sustainable employment for millions of people across a wide range of industries and skills. Our expert family of brands - Manpower, Experis and Talent Solutions - creates substantially more value for candidates and clients across 8 countries and territories and has done so for 70 years.