Engineering Manager, Model Serving

Together AISan Francisco, California

$250,000 - $300,000 / year

Apply with Sonara

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.¹

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

Overview

Schedule

Full-time

Career level

Director

Compensation

$250,000-$300,000/year

Benefits

Health Insurance

Paid Vacation

Health & Wellness Programs

Job Description

Together AI is building the AI Inference & Model Shaping Platform that brings the most advanced generative AI models to the world. Our platform powers multi-tenant server-less workloads and dedicated endpoints, enabling developers, enterprises, and researchers to harness the latest LLMs, multimodal models, image, audio, video, and reasoning models at scale.

We are looking for an exceptional Engineering Lead to partner closely with our cross-functional engineering, infrastructure, research, and sales teams to ensure excellence of our ML API offerings. Your primary focus will be on delivering world-class inference and fine-tuning in our public APIs and customer deployments by building automation and operations processes.

This role is ideal for a highly motivated and technically adept individual who excels in fast-paced, dynamic environments. You will be in charge of designing and scaling our ML processes & tooling at production scale – optimizing operations to ensure availability and reliability for our services, across differing tenants and user loads, and in a multi-cluster deployment. You will serve as a passionate advocate for internal and external customers, providing feedback to the wider engineering and infrastructure teams to improve our systems and core business metrics. If you thrive in a collaborative, problem-solving environment and are driven to deliver operational excellence, we encourage you to apply for this exciting opportunity.

Key Responsibilities

Own availability and performance SLAs for production inference and fine-tuning services across serverless and dedicated deployments
Own & improve testing, deployment, configuration management, and monitoring practices for multi-cluster ML infrastructure – partnering closely with Infra SREs
Build self-serve tooling and automation to reduce operational toil and enable self-serve offerings.
Define and enforce configuration best practices for inference engines (SGLang, TRT-LLM, vLLM etc.) to prevent runtime issues
Lead incident response, conduct postmortems, and drive reliability improvements
Mentor team members and potentially grow into hiring/team building as the organization scales
Partner with infrastructure and ML engineering teams to improve system reliability and cost efficiency

Required Qualifications

5+ years operating production ML inference or training systems at scale
2+ years in senior IC or tech lead roles, with demonstrated mentorship and technical leadership experience. Having built or scaled teams is a plus.
Deep expertise with Kubernetes, multi-cluster orchestration, and ML serving frameworks
Experience with multi-tenant SaaS platforms
Proven track record of SLA ownership with specific metrics (99.9% uptime, p99 latency targets)
Customer escalation and incident communication experience
Experience with LLM inference serving systems (SGLang, vLLM, TRT-LLM, or similar)
Ability to influence cross-functional teams and make deployment/architecture decisions

Nice to Have

Experience building internal developer platforms or self-serve tooling
Background in cost optimization for GPU infrastructure
Contributions to open-source ML infrastructure projects

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $250,000 - $300,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at https://www.together.ai/privacy

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

Apply with Sonara Apply manually

FAQs About Engineering Manager, Model Serving Jobs at Together AI

What is the work location for this position at Together AI?

This job at Together AI is located in San Francisco, California, according to the details provided by the employer. Some roles may also include multiple work locations depending on the requirement.

What pay range can candidates expect for this role at Together AI?

Candidates can expect a pay range of $250,000 and $300,000 per year.

What employment applies to this position at Together AI?

Together AI lists this role as a Full-time position.

What experience level is required for this role at Together AI?

Together AI is looking for a candidate with "Director" experience level.

What is the process to apply for this position at Together AI?

You can apply for this role at Together AI either through Sonara's automated application system, which helps you submit applications 10X faster with minimal effort, or by applying manually using the direct link on the job page.