Engineering Manager - Machine Learning Infrastructure

MoveworksMountain View, CA

Apply with Sonara

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.¹

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

Job Description

As the Engineering Manager for the Machine Learning Infrastructure team, you will spearhead the development of the cutting-edge platform that powers Moveworks' conversational AI. This role is absolutely critical to the long-term scalability of our core AI product and, ultimately, the company.

Your primary mission is to lead a team of talented engineers in building, optimizing, and scaling the end-to-end systems for the entire ML/LLM lifecycle. This includes our infrastructure for distributed training and inference, model evaluation frameworks, and LLM latency optimization. You will guide the team's technical vision, balancing the operational demands of our core ML infrastructure with forward-looking research to build the next generation of LLMs using cutting-edge generative AI.

The frameworks your team builds serve as the foundation for all ML models in production, serving hundreds of millions of enterprise employees. Your contributions will be instrumental in shaping the Moveworks Enterprise Copilot platform and defining the future of AI-driven employee services.

What You Will Do:

Lead, Mentor, and Grow a world-class team of ML and Systems Engineers, fostering a culture of innovation, ownership, and operational excellence that aligns with Moveworks' core principles.
Own the Technical Vision and roadmap for the end-to-end ML platform that powers the entire lifecycle-from data synthesis and distributed training to ultra-low-latency inference and serving-for hundreds of production models, including our proprietary MoveLM series.
Drive the Strategy for model performance and efficiency, making critical architectural decisions to optimize our GPU infrastructure for latency, throughput, and cost at massive scale.
Partner with Leaders across agentic platform, search platform, product engineering, and core infrastructure teams to define and deliver the foundational infrastructure that will power the next generation of agentic AI experiences.
Champion a Product Mindset for your platform, building powerful abstractions and tools that accelerate the velocity of machine learning engineers and researchers across the organization.

What you bring to the table:

A Master's or Ph.D. in Computer Science, Machine Learning, or a related field.
5+ years of industry experience with a proven track record of leading or managing high-performing machine learning or infrastructure teams.
Deep technical expertise in designing, building, and scaling end-to-end machine learning systems in production environments.
Strong command of Python and experience with performant languages such as C++ or GoLang.
Extensive experience with deep learning frameworks like PyTorch or Hugging Face.
Hands-on experience with modern LLM infrastructure, including distributed training frameworks (e.g., Deepspeed) and inference/serving frameworks (e.g., vLLM, TensorRT-LLM, Kubernetes).
A strategic mindset with experience balancing the demands of operating robust, scalable infrastructure with the need for forward-looking research and development.
Excellent communication and collaboration skills, with experience working cross-functionally to deliver complex projects.

Nice to have:

Experience working with Machine Learning products

Base Salary Compensation Range: $276,00-$340,000

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

Apply with Sonara Apply manually