
Data Engineer
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.1
Reclaim your time by letting our AI handle the grunt work of job searching.
We continuously scan millions of openings to find your top matches.

Overview
Job Description
About Axiom
Axiom is building the translational intelligence layer for drug discovery: AI systems that help scientists predict human toxicity earlier, more accurately, and more mechanistically than animal studies or legacy in vitro assays.
Unexpected toxicity is one of the largest reasons drug programs fail. Today, drug discovery teams still rely on fragmented experimental data, animal studies, literature evidence, and expert judgment to decide which molecules are safe enough to advance. We believe this can be dramatically improved.
Axiom generates and curates massive multimodal datasets spanning chemical structures, primary human cell imaging, transcriptomics, proteomics, mass spectrometry, ADME, clinical outcomes, human exposure, and literature-derived toxicity evidence. To date, we've built the largest experimental-to-clinical dataset and we are just getting started. These datasets power the models and agents that help drug hunters understand toxicity risk, mechanism, and safety margin.
We are looking for a data engineer to own the systems that make this possible. You will build the pipelines, infrastructure, APIs, and tooling that turn raw chemical, biological, and clinical data into ML ready training data and customer ready insights.
This is a foundational role. The quality, scale, and reliability of Axiom’s data systems will directly determine how fast our science moves, how good our models become, and how much customers trust our product.
Charter
Be a founding member of the team building the first accurate AI systems for drug toxicity prediction: systems that can help replace animal studies and legacy experiments with human-relevant predictive models.
You will build the data foundation that makes Axiom’s science and product possible.
What you will do
You will own core data infrastructure across Axiom’s research, ML, lab, and product systems.
You will:
Build and maintain Axiom’s core data platform for ingesting, processing, validating, storing, and serving chemical, biological, clinical, and customer datasets.
Build Axiom’s LabOS: the software layer that connects lab protocols, compound logistics, assay execution, plate/well metadata, instrument outputs, QC checks, data processing, model inference, and customer-facing results.
Turn raw experimental outputs into clean, versioned, ML-ready datasets across high-content imaging, transcriptomics, proteomics, mass spectrometry, ADME, dose-response, and clinical outcome data.
Design simple, reliable APIs and data interfaces that let scientists, ML researchers, and product engineers access the data they need without fighting infrastructure.
Build LLM powered systems for literature research, clinical data extraction, evidence curation, and dataset generation.
Develop distributed systems for running large-scale LLM jobs that clean, normalize, deduplicate, and structure biological and clinical data.
Scale inference pipelines for image models, graph neural networks, chemical models, and mechanistic agents.
Automate ETL from diverse sources, including lab instruments, CRO outputs, public databases, customer files, internal research tables, cloud storage, and literature-derived datasets.
Create rigorous data validation, testing, monitoring, lineage, and observability systems so Axiom can trust the datasets that drive model training, evaluation, customer delivery, and scientific decisions.
Work closely with scientists to understand messy real-world data needs and translate them into robust infrastructure.
Support customer-facing data delivery systems, including raw data transfer, processed feature exports, model predictions, compound metadata, and versioned result packages.
Build infrastructure that accelerates every team at Axiom.
What we are looking for
We are looking for someone who combines engineering taste, scientific curiosity, and extreme ownership.
You might be a great fit if:
You have built large-scale data platforms used by many internal teams or external users.
You are excited by messy, heterogeneous scientific data and want to make it clean, reliable, searchable, and useful.
You can move fluidly between backend engineering, distributed systems, ML infrastructure, data modeling, DevOps, and user-facing tooling.
You are comfortable talking to scientists, understanding their workflows, and building systems that make their work dramatically faster.
You care deeply about correctness, reproducibility, versioning, and data quality.
You have experience building AI- or LLM-powered data systems, especially for research workflows, retrieval, curation, or structured extraction.
You enjoy turning ambiguous research needs into simple, reliable infrastructure.
You want to own critical systems at an early-stage company.
You are deeply curious about biology, chemistry, drug discovery, AI, product, and business.
You could work in big tech, but you would rather build the data foundation for a company trying to change how medicines are discovered.
Technical skills we value
We do not expect every candidate to have all of these, but we are especially excited by experience with:
Python, Pandas, NumPy, Polars, PyArrow, DuckDB, SQL, and the broader Python data ecosystem.
Distributed systems and large-scale compute using Kubernetes, Slurm, Modal, Ray, Anyscale, Daft, Dask, Spark, or similar tools.
Cloud infrastructure on AWS, GCP, or Azure.
Infrastructure as code with Terraform, Pulumi, or similar tools.
CI/CD, automated testing, deployment systems, and production observability.
Data warehouses, lakehouses, object storage, and columnar formats such as Parquet.
Workflow orchestration tools such as Airflow, Dagster, Prefect, Flyte, or Argo.
LLM-powered data extraction, retrieval systems, evaluation harnesses, embeddings, and human-in-the-loop review systems.
ML inference infrastructure for image models, graph neural networks, chemical models, or large language models.
APIs, backend services, and internal tools that make complex data easy to use.
Scientific, biological, chemical, clinical, or healthcare data systems.
Petabyte-scale data processing.
The kind of person who thrives here
Axiom is not a normal company, and this is not a normal data engineering role.
We are looking for someone who wants to build the systems underneath a new kind of scientific company. The data is messy. The scale is large. The requirements change quickly. The users are brilliant and demanding. The stakes are high.
The people who thrive here:
Move with urgency.
Have exceptional engineering taste.
See what needs doing and do it.
Care deeply about reliability and correctness.
Can build fast without creating chaos.
Enjoy working with scientists and ML researchers.
Are excited by messy biological and chemical data.
Think in systems, not one-off scripts.
Want their work to multiply the output of the entire company.
Are not satisfied with incremental improvements.
Want to build a generational company.
We are looking for someone with a relentless observe-orient-decide-act loop: someone who constantly identifies bottlenecks, builds the right abstractions, and makes everyone around them faster.
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.
