
Principal Data Engineer
- Woking, Surrey
- Permanent
- Full-time
- Design, develop, and maintain scalable data pipelines using Databricks and Apache Spark (PySpark) to support analytics and other data-driven initiatives.
- Support the elaboration of requirements, formulation of the technical implementation plan and backlog refinement. Provide technical perspective to products enhancements & new requirements activities.
- Optimize Spark-based workflows for performance, scalability, and data integrity, ensuring alignment with GxP and other regulatory standards.
- Research, and promote new technologies, design patterns, approaches, tools and methodologies that could optimise and accelerate development.
- Apply strong software engineering practices including version control (Git), CI/CD pipelines, unit testing, and code reviews to ensure maintainable and production-grade code.
- Delivered reliable, scalable data pipelines that process clinical and pharmaceutical data efficiently, reducing data latency and improving time-to-insight for research and regulatory teams.
- Enabled regulatory compliance by implementing secure, auditable, and GxP-aligned data workflows with robust access controls.
- Improved system performance and cost-efficiency by optimizing Spark jobs and Databricks clusters, leading to measurable reductions in compute costs and processing times.
- Fostered cross-functional collaboration by building reusable. testable, well-documented Databricks notebooks and APIs that empower data scientists, analysts, and other stakeholders to build out our product suite.
- Contributed to a culture of engineering excellence through code reviews, CI/CD automation, and mentoring, resulting in higher code quality, faster deployments, and increased team productivity.
- Deployment of Databricks functionality in a SaaS environment (via infrastructure as code) with experience of Spark, Python and a breadth of database technologies
- Event-driven and distributed systems, using messaging systems like Kafka, AWS SNS/SQS and languages such as Java and Python
- Data Centric architectures, including experience with Data Governance / Management practices and Data Lakehouse / Data Intelligence platforms. Experience of AI software delivery and AI data preparation would also be an advantage