
Senior Principal, Data Engineering (Remote)
- Cambridge
- Permanent
- Full-time
- Lead the design, development and maintenance of data pipelines for processing Research and Development data from diverse sources (Clinical Trials, Medical Devices, Pre-Clinical, Omics, Real World Data) utilizing the AWS technology platform.
- Create and optimize ETL/ELT processes for structured and unstructured data using Python, R, SQL, AWS services and other tools.
- Build and maintain data repositories using AWS S3 and FSx technologies. Establish data warehousing solutions using Amazon Redshift.
- Build and maintain standard data models.
- Own data quality frameworks, validation processes and KPIs to ensure accuracy and consistency of data pipelines.
- Implement data versioning and lineage tracking to support data traceability, regulatory compliance and audit requirements.
- Create and maintain documentation for data processes, architectures, and workflows.
- Implement modern software development best practices (e.g. Code Versioning, DevOps, CD/CI).
- Support collaboration with RnD Researchers, Data scientists and Stakeholders to understand data requirements and deliver appropriate solutions in a global working model.
- Maintain compliance with data privacy regulations such as HIPAA, GDPR
- May be required to develop, deliver or support data literacy training across R&D.
- Expert knowledge of data engineering tools such as Python, R and SQL for data processing.
- Expert proficiency with AWS services particularly S3, Redshift, FSx, Glue, Lambda.
- Expert proficiency with relational databases.
- Strong background in data modeling and database design.
- Strong knowledge with unstructured database technologies (e.g. NoSQL) and other database types (e.g. Graph).
- Experience with Containerization such as Docker and EKS/Kubernetes.
- Experience with one or more RnD research process and associated regulatory requirements.
- Exposure to healthcare data standards (CDISC, HL7, FHIR, SNOMED CT, OMOP, DICOM).
- Experience to big data technologies and handling.
- Knowledge of machine learning operations (MLOps) and model deployment.
- Strong problem-solving and analytical abilities.
- Excellent communication skills for collaborating with stakeholders.
- Experience working in an Agile development environment.
- Bachelor’s Degree in Computer Science, Statistics, Mathematics, Life Sciences, or other relevant scientific fields; Master’s Degree preferred
- 5-7 years of experience in data engineering, with at least 2 years focusing on healthcare, research or clinical related data
- Occasional mobility within office environment
- Routinely sitting for extended periods of time
- Constantly operating a computer, printer, telephone and other similar office machinery