Engineering Jobs 2026 (Now Hiring) – Smart Auto Apply

We've scanned millions of jobs. Simply select your favorites, and we can fill out the applications for you.

Biohub logo

Staff Software Engineer - AI Clusters Production Engineering & SRE

Biohub
Redwood City, California

$241,000 - $331,000 / year

Biohub is a 501(c)(3) biomedical research organization building the first large-scale scientific initiative combining frontier AI with frontier biology to solve disease. We build t...

Posted 1 day ago

Jobgether logo

Lead Engineering Manager - REMOTE

Jobgether
Florida, Florida

$170,000 - $230,000 / year

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Engineer Manager - REMOTE. In this role, you will lead a high-performing...

Posted 1 day ago

Analytical Mechanics Associates logo

Operations Research/Engineering Intern - Uncertainty Quantification

Analytical Mechanics Associates
Hampton, Virginia
Job Description: Research on uncertainty quantification and decision making under uncertainty will be conducted. In particular, we will quantify the epistemic uncertainty in sliced...

Posted 1 day ago

Zone 5 Technologies logo

Software Engineering Manager – Embedded Software

Zone 5 Technologies
San Luis, California

$170,000 - $220,000 / year

At Zone 5 Technologies, we're redefining what's possible in unmanned aircraft systems. Our team of engineers and innovators is developing cutting-edge autonomous solutions that pus...

Posted 1 day ago

Pacific Fusion logo

Summer 2026 Internship - Mechanical Engineering

Pacific Fusion
Los Lunas, New Mexico
About Pacific Fusion Pacific Fusion was founded in 2023 with the mission to power the world with abundant, affordable, clean energy. We are rapidly designing and building a pulser-...

Posted 1 day ago

Snowflake logo

Senior Security Architect, Applied Field Engineering (AFE)

Snowflake
Menlo Park, California
At Snowflake, we are powering the era of the agentic enterprise. To usher in this new era, we seek AI-native thinkers across every function who are energized by the opportunity to...

Posted 1 day ago

R logo

Project Engineering Manager

Refresco Careers
Tulsa, Oklahoma
Make a Difference in YOUR Career! Our vision is both simple and ambitious: to put our drinks on every table. We are the leading global independent beverage solutions provider. We s...

Posted 1 day ago

Jobgether logo

Lead Engineering Director

Jobgether
Florida, Florida
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Director of Engineering - REMOTE. In this pivotal role, you will be at the foref...

Posted 1 day ago

Z logo

Software Engineering Manager, Quality Assurance (QA)

Zip
San Francisco, California

$180,000 - $210,000 / year

About Zip Zip is the AI platform for enterprise procurement — built for humans and agents working together. By orchestrating procurement across teams, tools, and suppliers with the...

Posted 1 day ago

H logo

Welding Engineering Specialist

Hadrian Automation
Los Angeles, California
Hadrian - Manufacturing the Future Hadrian is building autonomous factories that help aerospace and defense companies manufacture rockets, satellites, jets, and ships up to 10x fas...

Posted 1 day ago

L logo

Senior Engineering Manager, Edge Software

LVT
American Fork, Utah
ABOUT LVT LVT is redefining how businesses operate in the physical world, moving beyond traditional security solutions to deliver AI-driven, actionable intelligence that makes site...

Posted 1 day ago

Jobgether logo

Sr. Engineering Director (Remote)

Jobgether
Maine, Maine
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Director of Engineering - REMOTE. In this pivotal role, you will be at the foref...

Posted 1 day ago

H logo

SMT (Surface Mount Technology) Engineering Specialist

Hadrian Automation
Los Angeles, California
Hadrian - Manufacturing the Future Hadrian is building autonomous factories that help aerospace and defense companies manufacture rockets, satellites, jets, and ships up to 10x fas...

Posted 1 day ago

Jobgether logo

Remote Director of Engineering

Jobgether
Connecticut, Connecticut
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Director of Engineering - REMOTE. In this pivotal role, you will be at the foref...

Posted 1 day ago

H logo

Vice President, Test Engineering

Hadrian Automation
Los Angeles, California
Hadrian - Manufacturing the Future Hadrian is building autonomous factories that help aerospace and defense companies manufacture rockets, satellites, jets, and ships up to 10x fas...

Posted 1 day ago

G logo

Sr. Engineering Manager – Engine, Drive Unit, Transmission Architecture

GM
Warren, Michigan
Job Description At General Motors, our product teams are redefining mobility. Through a human-centered design process, we create vehicles and experiences that are designed not just...

Posted 1 day ago

Axcelis Technologies logo

RF Engineering Assistant Intern

Axcelis Technologies
Beverly, Massachusetts
JOB DESCRIPTION Title: RF Engineering Assistant Intern Schedule: June–August 2026 Location: Beverly, MA Axcelis is a global leader in ion implantation and semiconductor technology....

Posted 1 day ago

Pacific Fusion logo

Summer 2026 Internship - Manufacturing Engineering

Pacific Fusion
San Leandro, California
About Pacific Fusion Pacific Fusion was founded in 2023 with the mission to power the world with abundant, affordable, clean energy. We are rapidly designing and building a pulser-...

Posted 1 day ago

Hendricks Regional Health logo

Facilities Manager Full Time- Engineering Day Shift

Hendricks Regional Health
Danville, Indiana
Job Summary : Serves the needs of patients, visitors, associates, and public through the effective and efficient supervision of the hospital maintenance, electrical safety, biomedi...

Posted 1 day ago

H logo

Sheet Metal Engineering Specialist

Hadrian Automation
Los Angeles, California
Hadrian - Manufacturing the Future Hadrian is building autonomous factories that help aerospace and defense companies manufacture rockets, satellites, jets, and ships up to 10x fas...

Posted 1 day ago

Biohub logo

Staff Software Engineer - AI Clusters Production Engineering & SRE

BiohubRedwood City, California

$241,000 - $331,000 / year

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.1

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

pay-wall

Overview

Schedule
Full-time
Career level
Senior-level
Remote
Hybrid remote
Compensation
$241,000-$331,000/year
Benefits
Paid Vacation
Paid Community Service Time
401k Matching/Retirement Savings

Job Description

Biohub is a 501(c)(3) biomedical research organization building the first large-scale scientific initiative combining frontier AI with frontier biology to solve disease. We build the technology to help scientists around the world use AI-powered biology to study how cells operate, organize, and work as part of systems to understand why disease happens and how to correct it. With our compute capacity, AI research and engineering, and state-of-the-art technology for measuring, imaging, and programming biology, we are enabling scientists worldwide to use AI-powered biology to advance our understanding of human health.

The Team

The AI Cluster Production Engineering team is part of the AI Compute Platform organization at Biohub, a non-profit research lab committed to open science and open-source AI. We own the design, operation, and reliability of large-scale multi-GPU AI clusters that power frontier AI biology research: protein language models, genomic foundation models, and scientific reasoning systems built to be shared, not monetized. Our clusters run Slurm on Kubernetes infrastructure and support everything from day-to-day AI researcher workflows to multi-node hero training runs at thousands of GPUs. The team works at the intersection of AI tooling, distributed systems, HPC, and frontier AI, debugging deep AI infrastructure problems and building AI systems critical to the entire AI organization.

The Opportunity

CZ Biohub's mission is to cure or prevent all human disease. Achieving that requires training frontier-scale AI biology models, and that demands reliable, high-performance compute infrastructure. This is production engineering work at a frontier AI lab, with the twist that the mission is biology and the science is open. You'll keep GPU clusters running at high utilization, debug the toughest distributed systems failures, and build the operational foundations for scaling to multi-thousand GPU hero runs.  The technical problems are genuinely hard (e.g., multi-node distributed training, InfiniBand fabrics, large-scale storage, Slurm at scale) inside an organization where the work is aimed at helping people, not optimizing ad revenue.

What You'll Do

  • Own reliability, observability, and incident response for multi-site GPU clusters running Slurm on Kubernetes. Build the systems, automation, and processes that keep clusters healthy,  and that enable fast, efficient recovery when things break.
  • Debug and resolve deep infrastructure failures across storage, networking, scheduling, and GPU compute layers. Build the tooling and operational patterns that make these failures easier to detect, diagnose, and prevent.
  • Design and execute GPU cluster scaling plans, systematically validating storage, networking, interconnect, and scheduler behavior as clusters grow to support larger training runs.
  • Build automation and tooling to manage cluster operations at scale: capacity planning, GPU utilization monitoring workload manager policy management, and pod lifecycle automation.
  • Drive configuration-as-code practices, ensuring cluster state is reproducible and auditable, and managed through version-controlled pipelines.
  • Collaborate directly with AI researchers and hero run leads to understand training workload patterns and design infrastructure that meets frontier-scale requirements.
  • Own the vendor relationship on technical issues — escalating SEV1s, coordinating across multiple partners and network backbone teams, holding them accountable to root/proximate cause analysis and SLAs.
  • Contribute to capacity planning: projecting GPU demand, managing cluster expansion across GPU generations, and coordinating multi-cluster strategy.
  • Improve operational resilience, reducing mean time to detect and resolve incidents, reducing toil through automation, and developing runbooks that scale the team's operational knowledge beyond any individual.

What You'll Bring

  • 8+ years of AI/ML infrastructure engineering experience, with deep expertise in at least one of: HPC/Slurm cluster operations, Kubernetes at scale, distributed systems debugging, or GPU compute infrastructure.
  • Strong Linux systems fundamentals — networking (TCP/IP, InfiniBand, RDMA, MTU/MSS/PMTUD), storage (NFS, VAST, WEKA, POSIX semantics), kernel internals (cgroups, namespaces, eBPF, sysctls).
  • Hands-on experience with Kubernetes and cloud-native infrastructure — pod lifecycle, CNI plugins (Cilium preferred), StatefulSets, Helm, ArgoCD, or equivalent GitOps tooling.
  • Experience with HPC workload managers — Slurm strongly preferred (QoS, partitions, preemption, accounting, Sunk/CoreWeave patterns a plus).
  • Debugging instinct: ability to form hypotheses quickly, design controlled experiments, and root cause complex multi-system failures under pressure. You enjoy finding the hard bugs.
  • Proficiency in Python and Bash for automation and tooling. Go, Rust, or C/C++ a plus.
  • Experience with observability stacks — Prometheus/VictoriaMetrics, Grafana, DCGM metrics, distributed tracing. You know how to instrument systems you don't control.
  • Excellent communication — you can write a crisp incident summary for researchers, a technical escalation to a vendor CTO, and a system design doc for teammates, all in the same day.
  • Bonus: experience with distributed AI training infrastructure (NCCL, PyTorch DDP, multi-node job debugging, checkpoint/restart patterns, container environments for large-scale training).

Compensation

The Redwood City, CA base pay range for a new hire in this role is $241,000 - $331,000. New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process. 

Better Together

As we grow, we’re excited to strengthen in-person connections and cultivate a collaborative, team-oriented environment. This role is a hybrid position requiring you to be onsite for at least 60% of the working month, approximately 3 days a week, with specific in-office days determined by the team’s manager. The exact schedule will be at the hiring manager's discretion and communicated during the interview process.

Benefits for the Whole You

We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible. 

  • Provides a generous employer match on employee 401(k) contributions to support planning for the future.
  • Paid time off to volunteer at an organization of your choice. 
  • Funding for select family-forming benefits. 
  • Relocation support for employees who need assistance moving

If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.

#LI-Hybrid 

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

pay-wall