M logo

Member of Technical Staff, Post-Training, RL Environments

MirendilUnited States, California

$350,000 - $500,000 / year

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.1

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

pay-wall

Overview

Schedule
Full-time
Career level
Senior-level
Remote
On-site
Compensation
$350,000-$500,000/year
Benefits
Paid Vacation

Job Description

Mirendil

Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration across science and technology. Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe accelerating scientific discovery is one of the most powerful ways to improve the future of humanity, and that AI will play a central role in making that possible.

We are building a frontier AI research company and training our own models end-to-end. Our work spans areas such as model training, reinforcement learning, reasoning systems, and infrastructure for large-scale experiments. Our team includes researchers and engineers from Anthropic, Google DeepMind, xAI, OpenAI, Microsoft, Apple, and MIT.

The Role

We are looking for a research engineer to build the data systems and execution environments that power reinforcement learning at Mirendil. The quality of our models depends directly on the quality of the data and environments we train on; you will own those systems end-to-end. Some example areas you might work on (not limited to):

  • Build and automate data collection pipelines for complex, long-horizon RL tasks.

  • Build robust systems to identify and prevent reward hacking.

  • Build scalable sandboxed execution environments for realistic tasks involving potentially multiple agents, nodes, and users.

  • Design systems to estimate the influence of training environments on production model behavior.

  • Collaborate with teams across the stack to identify potential axes of improvements in production model behavior, and develop training environments to push these axes.

If you're excited about building the data and environment infrastructure that determine what our models learn, we'd love to hear from you.

We offer a base salary of $350,000–$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

pay-wall

FAQs About Member of Technical Staff, Post-Training, RL Environments Jobs at Mirendil

What is the work location for this position at Mirendil?
This job at Mirendil is located in United States, California, according to the details provided by the employer. Some roles may also include multiple work locations depending on the requirement.
What pay range can candidates expect for this role at Mirendil?
Candidates can expect a pay range of $350,000 and $500,000 per year.
What employment applies to this position at Mirendil?
Mirendil lists this role as a Full-time position.
What experience level is required for this role at Mirendil?
Mirendil is looking for a candidate with "Senior-level" experience level.
What benefits are offered by Mirendil for this role?
Mirendil offers Paid Vacation for this position. Actual benefits may vary depending on the employer's policies and employment terms.
What is the process to apply for this position at Mirendil?
You can apply for this role at Mirendil either through Sonara's automated application system, which helps you submit applications 10X faster with minimal effort, or by applying manually using the direct link on the job page.