
Senior Data Engineer - Pathogen
Ellison Institute of Technology
- Oxford
- Permanent
- Full-time
- Ensure data in the platform is acquired, processed, curated, and made accessible to scientists, digital analytics products, bioinformatics, and AI at a high standard of quality and availability
- Ensure data access adheres to FAIR principles (Findable, Accessible, Interoperable, and Re-usable)
- Ensure data is secured and compliant with regulatory, legal, and data sharing requirements
- Ensure efficient, performant, and high-quality pipelines for data ingestion into the platform
- Contribute to building data management components, including reference data management, de-identification, data curation, pathogen and technical metadata catalogues, and data access controls
- Ensure efficient, secure, scalable, available, and performant data storage components, including genomic variant storage, clinical data stores, and clinical imaging
- Ensure robust ingest services capable of seamlessly integrating data from distributed sequencing devices, including real-time telemetry streams
- Ensure data is processed to enable optimal access and consumption by digital analysis products, bioinformatics pipelines, and researchers/scientists
- Deep experience in building modern data platforms using cloud-based architectures and tools
- Experience delivering data engineering solutions on cloud platforms, preferably Oracle OCI, AWS, or Azure
- Proficient in Python and workflow orchestration tools such as Airflow or Prefect
- Expert in data modeling, ETL, and SQL
- Experience with real-time analytics from telemetry and event-based streaming (e.g., Kafka)
- Experience managing operational data stores with high availability, performance, and scalability
- Expertise in data lakes, lakehouses, Apache Iceberg, and data mesh architectures
- Proven ability to build, deliver, and support modern data platforms at scale
- Strong knowledge of data governance, data quality, and data cataloguing
- Experience with modern database technologies, including Iceberg, NoSQL, and vector databases
- Embraces innovation and works closely with scientists and partners to explore cutting-edge technology
- Knowledge of master data, metadata, and reference data management
- Understanding of Agile practices and sprint-based methodologies
- Active contributor to knowledge sharing and collaboration
- Familiarity with genomics and associated data standards
- Experience with healthcare clinical data and standards such as OMOP and SNOMED
- Familiarity with containerization tools such as Docker and Kubernetes
- Familiarity with Git and CI/CD workflows
- Strong collaborator with excellent communication skills
- Comfortable working in a fast-paced, dynamic environment
- Eagerness to learn and cross-train in new technologies
- Proactive and hands-on approach to exploring new tools and developing proof of concepts (POCs)
- Salary: Competitive salary on offer
- Enhanced holiday pay
- Pension
- Life Assurance
- Income Protection
- Private Medical Insurance
- Hospital Cash Plan
- Therapy Services
- Perk Box
- Electrical Car Scheme
- You must have the right to work permanently in the UK with a willingness to travel as necessary.
- You will live in, relocate to, or be within easy commuting distance of Oxford.
- During peak periods, some longer hours may be required and some working across multiple time zones due to the global nature of the programme.
We are sorry but this recruiter does not accept applications from abroad.