GPU Performance Engineer

Adaptive MLNew York, NY

Apply with Sonara

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.¹

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

Overview

Schedule

Full-time

Career level

Senior-level

Remote

On-site

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Job Description

About the team

Adaptive ML is building a reinforcement learning platform to tune, evaluate, and serve specialized language models. We are pioneering the development of task-specific LLMs using synthetic data, creating the foundational tools and products needed for models to self-critique and self-improve based on simple guidelines. Adaptive Engine enables companies to build and deploy the best LLMs for their business. Our founders previously worked together to create state-of-the-art open LLMs. We closed a $20M seed with Index & ICONIQ in early 2024 and are live with our first enterprise customers (e.g., AT&T).

Our Technical Staff develops the foundational technology that powers Adaptive ML in alignment with requests and requirements from our Commercial and Product teams. We are committed to building robust, efficient technology and conducting at-scale, impactful research to drive our roadmap and deliver value to our customers.

About the role

As a GPU Performance Engineer in our Technical Staff, you will help ensure that our LLM stack (Adaptive Harmony) delivers state of the art performance across a wide variety of settings; from latency-bound regimes where serving requests with sub-second response times is key, to throughput-bound regimes during training and offline inference. You will help build the foundational technology powering Adaptive ML by delivering performance improvements directly to our clients as well as to our internal workloads. We are looking for self-driven, business-minded, and ambitious individuals interested in supporting real-world deployments of a highly technical product. As this is an early role, you will have the opportunity to shape our research efforts and product as we grow.

This is an in-person role based at our Paris or New York office.

Your responsibilities

Build and maintain fast and robust GPU code, focusing on delivering performance improvements in real world applications;
Write high-quality software in CUDA, CUTLASS, or Triton with a focus on performance and robustness;
Profile dedicated GPU kernels, optimizing across latency/compute-bound regimes for complex workloads;
Contribute to our product roadmap, by identifying promising trends that can improve performance;
Report clearly on your work to a distributed collaborative team, with a bias for asynchronous written communication.

Your (ideal) background

The background below is only suggestive of a few pointers we believe could be relevant. We welcome applications from candidates with diverse backgrounds; do not hesitate to get in touch if you think you could be a great fit, even if the below doesn't fully describe you.

A M.Sc. /Ph.D. in computer science, or demonstrated experience in software engineering, preferably with a focus on GPU-optimization;
Strong programming skills, preferably with a focus on systems and general purpose GPU programming;
A track record of writing high performance kernels, having preferably demonstrated ability to reach state of the art performance on well defined tasks;
Contributions to relevant open-source projects, such as CUTLASS, Triton and MLIR;
Passionate about the future of generative AI, and eager to build foundational technology to help machines deliver more singular experiences.

Benefits

Comprehensive medical (health, dental, and vision) insurance;
401(k) plan with 4% matching (or equivalent);
Unlimited PTO - we strongly encourage at least 5 weeks each year;
Mental health, wellness, and personal development stipends;
Visa sponsorship if you wish to relocate to New York or Paris.

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

Apply with Sonara Apply manually

FAQs About GPU Performance Engineer Jobs at Adaptive ML

What is the work location for this position at Adaptive ML?

This job at Adaptive ML is located in New York, NY, according to the details provided by the employer. Some roles may also include multiple work locations depending on the requirement.

What pay range can candidates expect for this role at Adaptive ML?

Employer has not shared pay details for this role.

What employment applies to this position at Adaptive ML?

Adaptive ML lists this role as a Full-time position.

What experience level is required for this role at Adaptive ML?

Adaptive ML is looking for a candidate with "Senior-level" experience level.

What is the process to apply for this position at Adaptive ML?

You can apply for this role at Adaptive ML either through Sonara's automated application system, which helps you submit applications 10X faster with minimal effort, or by applying manually using the direct link on the job page.