
Data Engineer
- Cambridge
- Permanent
- Full-time
- Opportunity to Build and Maintain Data Pipelines: Your primary responsibility will be to build, maintain, and improve our data pipelines and ETL/ELT processes
- Work with Data Warehousing Solutions: You will work with our data warehousing solutions, contributing to data models and optimizing queries to ensure data is accessible and performant for our analytics teams.
- Develop and Monitor Data Workflows: You will help develop, maintain, and monitor our data ingestion and delivery pipelines using modern orchestration tools, ensuring data flows seamlessly and reliably
- Uphold Data Quality: You will apply best practices for data quality, testing, and observability, helping to ensure the data delivered to stakeholders is accurate and trustworthy.
- Collaborate on Data-Driven Solutions: You will work closely with our talented Data Scientists and R&D teams, understanding their requirements to provide the clean and structured data needed to power their research
- Support System Reliability: You will help monitor the health and performance of our data systems. When issues arise, you'll assist with root cause analysis, deploy fixes, and provide technical support.
- Contribute to Technical Excellence: You will continuously learn about new data technologies, help test and implement enhancements to our data platform, and contribute to technical documentation.
- Experience in Data Pipeline and ETL Development: Solid experience building and maintaining data pipelines, with a good understanding of ETL/ELT patterns.
- Proficiency in Python and SQL: Strong, hands-on experience using Python for data processing and automation, and solid SQL skills for querying and data manipulation.
- Understanding of Data Modeling and Warehousing: A good understanding of data modeling techniques and data warehousing concepts.
- Expertise with Cloud Platforms: Experience with major cloud providers (GCP, AWS, or Azure) and their core data services. We primarily use GCP, so experience there is a significant plus.
- Familiarity with Big Data Technologies: Exposure to or experience with large-scale data processing frameworks (e.g., Spark, or similar).
- Workflow Orchestration: Familiarity with data workflow orchestration tools (e.g., Airflow, or similar).
- Infrastructure as Code (IaC): An interest in or exposure to IaC tools (e.g., Terraform).
- Containerization: Familiarity with container technologies like Docker and Kubernetes.
- CI/CD for Data: A basic understanding of how to apply continuous integration/delivery principles to data workflows.
- Data Quality and Testing: An interest in modern data quality and testing frameworks.
- Version Control: Proficiency with version control systems like Git.