Sr. Manager, Reliability Engineering

ZocDoc, Inc.New York City, NY

Apply with Sonara

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.¹

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

Job Description

Your Impact on our Mission:

At Zocdoc, we're on a mission to give power to the patient-and that means making sure our systems are always available when patients and providers need them. As the Senior Manager of Reliability Engineering, you'll lead our SRE, DBRE, and Cloud Engineering teams to deliver highly available, scalable, and observable systems. Your leadership will shape the foundational infrastructure that powers every appointment booked, every provider searched, and every connection made.

This is a pivotal role in the Infrastructure organization. You'll help reduce incidents, optimize performance, and drive operational excellence across our distributed environments, from legacy monoliths to modern microservices. You'll collaborate across engineering to enable faster delivery, safer deployments, and a more resilient platform-helping Zocdoc meet the needs of millions of users.

You'll also help define Zocdoc's internal reliability platform strategy, building scalable internal services that product teams can adopt easily and confidently. Your team will support our ongoing migration to cloud-native architecture while maintaining thoughtful investment in our monolith, balancing modernization with system stability. And you'll play a central role in cloud tooling and vendor strategy, helping us make high-ROI decisions on infrastructure investments.

You'll enjoy this role if you are…

A technical leader who thrives on building and coaching high-performing, mission-driven engineering teams
Passionate about reliability, observability, and operational rigor in complex, distributed environments
Driven by impact and pragmatism, with a bias toward data-informed decisions and continuous improvement
Comfortable navigating failures and ambiguity, and experienced in leading high-stakes incident response
Motivated to work across infrastructure domains (SRE, DBRE, Cloud) and evolve platform maturity
Energized by building reliable platforms as internal products and enabling others to move faster with confidence
A strong advocate for inclusive, empowering team cultures that foster growth and accountability
Comfortable working autonomously and collaboratively in a distributed and flexible work environment
Excited to grow as a leader, working closely with senior infrastructure leadership to expand your influence

Your day to day is…

Leading and mentoring a multi-disciplinary team of SREs, DBREs, and Cloud Engineers through technical execution and career development
Defining and driving initiatives that improve reliability, resilience, observability, and performance across our platform
Partnering with Engineering and Product leaders to ensure systems are safe by default and operationally sustainable
Driving incident management practices-from fast response to meaningful retrospectives that lead to real change
Creating and evolving metrics-driven processes to track team, service, and project health
Owning roadmaps that link infrastructure investments to business value, including cost optimization and tooling strategy
Managing cloud tooling and vendor decisions to maximize ROI and technical leverage
Supporting hybrid architectural strategies, improving the reliability of both our monolith and cloud-native services
Acting as a strategic resource for cloud operations, shared platforms, and system-wide reliability architecture
Championing technical culture-career ladders, hiring standards, and cross-team initiatives that grow a thriving org

You'll be successful in this role if you have…

10+ years of experience in Site Reliability, Database Reliability, or closely related fields (e.g., DevOps, Cloud, Platform, Backend)
5+ years of experience leading technical teams, including hiring, performance management, and org building
A track record of managing mission-critical production systems, balancing velocity with safety
Experience with AWS cloud services, distributed systems, observability stacks, and architectural patterns for scale
Strong instincts around incident response, blameless retrospectives, and risk mitigation
The ability to influence decisions, challenge assumptions, and guide teams through trade-offs
Exceptional communication skills that span technical depth and business clarity
A passion for mentoring engineers, growing leadership skills, and building resilient engineering cultures
Familiarity with managing internal platforms as products, and partnering cross-functionally to maximize adoption and impact

Benefits:

Flexible, hybrid work environment at our convenient Soho location
Unlimited Vacation
100% paid employee health benefit options (including medical, dental, and vision)
Commuter Benefits
401(k) with employer funded match
Corporate wellness programs with Headspace and Peloton
Sabbatical leave (for employees with 5+ years of service)
Competitive paid parental leave and fertility/family planning reimbursement
Cell phone reimbursement
Catered lunch everyday along with beverages and snacks
Employee Resource Groups and ZocClubs to promote shared community and belonging
Great Place to Work Certified

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

Apply with Sonara Apply manually