
Data Engineer
- Stone, Staffordshire
- Permanent
- Full-time
Location: Stone, Staffordshire (Hybrid working, 2 days in our Stone office)
Status: Permanent, Full Time
Package: Competitive Salary, Flexible Working (with one-off allowance and 2 Days in the office), Development & Opportunity (Personal & Technical), Private Medical (Optical & Dental options), Matching Contributory Pension, 25 Days Leave + Public Holidays + Buy and Sell Scheme, Life Insurance, Referral Scheme, Employee Assistance Program, Benefits Hub.Who's Instem?Well, we're a global provider of bespoke industry-leading software solutions and services, which facilitate the pre-clinical, and clinical phases of the drug discovery process. We have over fifteen products in our portfolio, used by over 700 pharmaceutical clients (including all the top 20!)What's the culture/environment like? For a global business of over 300 staff, we very much have a family feel. You'll be part of a friendly, communal, solution based, flexible environment, where you'll feel empowered, valued and accountable. We'll invest in you as a person and encourage you to take part in companywide workshops for wellbeing, mental health, critical conversations, and strengths.Why are we hiring a Data Engineer?This is a newly created role and it will be critical to ensuring that data is accessible, reliable, and optimized for analytics and business intelligence across the organization.What are you responsible for?Data Ingestion & Integration
- Design and implement robust data pipelines to ingest data from multiple internal and external sources (e.g., databases, APIs, flat files, cloud services).
- Develop ETL/ELT processes to clean, transform, and prepare data for analysis.
- Integrate structured and unstructured data from disparate systems.
- Design scalable and performant data models (star/snowflake schemas, normalized/denormalized structures) for analytical workloads.
- Build and maintain data warehouses, data lakes, or Lakehouse architectures.
- Define metadata, data lineage, and schema management processes.
- Implement validation and profiling routines to monitor data quality and consistency.
- Set up logging, alerting, and metrics for pipeline reliability and data integrity.
- Collaborate with data stewards and governance teams to align with data standards.
- Work with data scientists, analysts, and domain experts to understand data requirements and research use cases.
- Prototype data sets and transformation logic to support exploration and discovery.
- Help define and prioritize data needs based on early-stage research opportunity signals.
- Automate data workflows using orchestration tools (e.g., Airflow, Prefect).
- Contribute to the deployment of cloud-native or hybrid data infrastructure.
- Optimize pipeline performance and cost efficiency (e.g., storage, compute).
- Identify gaps in current data availability or quality that may block research opportunity detection.
- Recommend tools, platforms, or architectural patterns to enhance data capabilities.
- Contribute to a roadmap for data engineering and analytics maturity.
- Programming - proficiency in Python, SQL, and optionally Scala or Java.
- Data Platforms - experience with cloud-based data services (e.g., AWS Redshift, Azure Synapse, GCP BigQuery, Snowflake).
- ETL Tools - Airflow, dbt, Apache NiFi, or similar tools for workflow and pipeline orchestration.
- Data Storage - familiarity with relational databases (PostgreSQL, MySQL), NoSQL (MongoDB, DynamoDB), and file formats (Parquet, Avro, JSON).
- Data Modelling - strong grasp of dimensional modelling and data warehousing concepts.
- DevOps/DataOps - Git, CI/CD pipelines, infrastructure as code (Terraform, CloudFormation).
- Data Architecture - understanding of modern data architecture patterns (e.g., medallion architecture, data mesh, lambda architecture).
- Data Governance - awareness of data privacy, security, and compliance (e.g., GDPR, HIPAA).
- Analytics Foundations - understanding of how data is used for analysis, ML, or research, even if not directly building models.
- Hands-on experience as a data engineer or in a similar role.
- Experience working in cross-functional teams, ideally in research-heavy or data-driven environments (e.g., life sciences, pharma, healthcare, academia).
- Knowledge of chemical structure representation, nomenclature, chemical transformation, and structure activity relationship forms
- Knowledge of toxicology and toxicological study types
- Proven ability to work independently on open-ended tasks, including shaping requirements and driving toward implementation.
- Experience with exploratory data work, helping uncover patterns or opportunities through early-stage prototyping.
- Knowledge of our Quality Management System and its application to tasks associated with this role