Lead Research Engineer

Lightning AINew York, NY

$225,000 - $275,000 / year

Apply with Sonara

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.¹

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

Overview

Schedule

Full-time

Career level

Senior-level

Remote

Hybrid remote

Compensation

$225,000-$275,000/year

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Job Description

Who We Are

Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems-designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.

We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

Our Values

Move Fast: We act with speed and precision, breaking down big challenges into achievable steps.
Focus: We complete one goal at a time with care, collaborating as a team to deliver features with precision.
Balance: Sustained performance comes from rest and recovery. We ensure a healthy work-life balance to keep you at your best.
Craftsmanship: Innovation through excellence. Every detail matters, and we take pride in mastering our craft.
Minimal: Simplicity drives our innovation. We eliminate complexity through discipline and focus on what truly matters.

What We're Looking For

We are seeking a highly skilled Lead Research Engineer to work on optimizing training and inference workloads on compute accelerators and clusters, through the Lightning Thunder compiler and the broader PyTorch Lightning ecosystem. This role sits at the intersection of deep learning research, compiler development, and large-scale system optimization. You'll be shaping technology that pushes the boundaries of model performance and efficiency, creating foundational software that will impact the entire machine learning ecosystem.

This role is based in one of our hubs (NYC, SF, or London), with a minimum of 2 in-office days per week and occasional team and company offsites.

What You'll Do

Develop performance-oriented model optimizations at multiple levels:
Graph-level (e.g., operator fusion, kernel scheduling, memory planning)
Kernel-level (CUDA, Triton, custom operators for specialized hardware)
System-level (distributed training across GPUs/TPUs, inference serving at scale)
Advance the Thunder compiler by building optimization passes, graph transformations, and integration hooks to accelerate training and inference workloads.
Work across the software stack to ensure optimizations are accessible to end users through clean APIs, automated tooling, and seamless integration with PyTorch Lightning.

Design and implement profiling and debugging tools to analyze model execution, identify bottlenecks, and guide optimization strategies.

Collaborate with hardware vendors and ecosystem partners to ensure Thunder runs efficiently across diverse backends (NVIDIA, AMD, TPU, specialized accelerators).
Contribute to open-source projects by developing new features, improving documentation, and supporting community adoption.
Engage with researchers and engineers in the community, providing guidance on performance tuning and advocating for Thunder as the go-to optimization layer in ML workflows.
Work cross-functionally with Lightning's product and engineering teams to ensure compiler and optimization improvements align with the broader product vision.

What You'll Need

Strong expertise with deep learning frameworks such as PyTorch
Hands-on experience with model optimization techniques, including graph-level optimizations, quantization, pruning, mixed precision, or memory-efficient training.
Knowledge of distributed systems and parallelism strategies (data/model/pipeline parallelism, checkpointing, elastic scaling).
Familiarity with software engineering practices: designing APIs, building robust tooling, testing, CI/CD for performance-sensitive systems.
Excellent collaboration and communication skills, with the ability to partner across research, engineering, and external contributors.
Bachelor's degree in Computer Science, Engineering, or a related field

Nice-to-Haves

Experience with CUDA, Triton, or other GPU programming models for developing custom kernels.
Deep understanding of deep learning compiler internals (IR design, operator fusion, scheduling, optimization passes) or proven work in performance-critical software.
Proven track record contributing to open-source projects in ML, HPC, or compiler domains.
Advanced degree (Master's or PhD) in machine learning, compilers, or systems highly preferred.

Compensation

We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits.

The anticipated annual base salary range for this role is:

$225,000—$275,000 USD

Benefits and Perks

We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role.

Benefits include:

Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
Generous paid time off, plus holidays
Paid parental leave
Professional development support
Wellness and work-from-home stipends
Flexible work environment

At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

Apply with Sonara Apply manually

FAQs About Lead Research Engineer Jobs at Lightning AI

What is the work location for this position at Lightning AI?

This job at Lightning AI is located in New York, NY, according to the details provided by the employer. Some roles may also include multiple work locations depending on the requirement.

What pay range can candidates expect for this role at Lightning AI?

Candidates can expect a pay range of $225,000 and $275,000 per year.

What employment applies to this position at Lightning AI?

Lightning AI lists this role as a Full-time position.

What experience level is required for this role at Lightning AI?

Lightning AI is looking for a candidate with "Senior-level" experience level.

What benefits are offered by Lightning AI for this role?

Lightning AI offers following benefits: Health Insurance, Dental Insurance, Vision Insurance, Paid Holidays, Paid Vacation, Parental and Family Leave, Career Development, 401k Matching/Retirement Savings, Health & Wellness Programs, and Home Office Reimbursement/Stipend for this position. Actual benefits may vary depending on the employer's policies and employment terms.

What is the process to apply for this position at Lightning AI?

You can apply for this role at Lightning AI either through Sonara's automated application system, which helps you submit applications 10X faster with minimal effort, or by applying manually using the direct link on the job page.