landing_page-logo
Datadog logo

Manager I, Engineering - Metrics Platform Resilience Automation

DatadogNew York, NY

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.1

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

pay-wall

Job Description

The Resilience Automation team is responsible for improving resiliency tools and processes across the Metrics Platform, a large-scale distributed system processing 2B+ points per second.

The team owns efforts around validation & correctness testing, zonal failure resiliency, and autoscaling systems. It also develops load testing automation and an administrative control panel for the Metrics Platform. As the Engineering Manager for this team (which is situated within the greater Metrics Platform Automation group), you'll lead several software engineers in planning and execution, evolve the team's vision, and identify opportunities to make it a reality - ultimately empowering Metrics Platform teams to improve their services' reliability.

At Datadog, we place value in our office culture - the relationships that it builds, the creativity it brings to the table, and the collaboration of being together. We operate as a hybrid workplace to ensure our employees can create a work-life harmony that best fits them.

What You'll Do:

  • Own the vision and execution for Metrics Platform Resilience Automation
  • Lead, mentor, and grow a team of Software Engineers
  • Be responsible for the internal tooling that tests the Metrics Platform, and expand both its usage and coverage
  • Collaborate with Metrics Platform teams to align the Resilience Automation roadmap with their goals and Datadog's overall strategic goals
  • Establish relationships with leaders in Metrics Platform and the wider Observability Data Platforms org to identify opportunities for our automation

Who You Are:

  • You've managed and led one or more teams and have shipped successful products
  • You've worked with distributed systems in a reliability or devops oriented role
  • You are passionate about improving developer experience and operational efficiency
  • You have strong technical skills and are willing to share on-call responsibilities with the team
  • You excel in cross-functional collaboration and are comfortable in a fast-paced, high-growth environment
  • You are a strategic thinker with a track record of translating complex technical concepts into actionable plans
  • You have a BS/MS/PhD in a Computer Science, Engineering or related scientific field or equivalent professional experience

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you're passionate about technology and want to grow your skills, we encourage you to apply.

Benefits and Growth:

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our Internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

To conform to US export control regulations, candidates should be eligible for any required authorizations from the US government. This job is available in various departments within our company; to conform to US export control regulations, some of these roles may require candidates to be eligible for any required authorizations from the US government.

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

pay-wall