Sr. Manager, Reliability Engineering
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.1
Reclaim your time by letting our AI handle the grunt work of job searching.
We continuously scan millions of openings to find your top matches.

Job Description
Your Impact on our Mission:
At Zocdoc, we're on a mission to give power to the patient-and that means making sure our systems are always available when patients and providers need them. As the Senior Manager of Reliability Engineering, you'll lead our SRE, DBRE, and Cloud Engineering teams to deliver highly available, scalable, and observable systems. Your leadership will shape the foundational infrastructure that powers every appointment booked, every provider searched, and every connection made.
This is a pivotal role in the Infrastructure organization. You'll help reduce incidents, optimize performance, and drive operational excellence across our distributed environments, from legacy monoliths to modern microservices. You'll collaborate across engineering to enable faster delivery, safer deployments, and a more resilient platform-helping Zocdoc meet the needs of millions of users.
You'll also help define Zocdoc's internal reliability platform strategy, building scalable internal services that product teams can adopt easily and confidently. Your team will support our ongoing migration to cloud-native architecture while maintaining thoughtful investment in our monolith, balancing modernization with system stability. And you'll play a central role in cloud tooling and vendor strategy, helping us make high-ROI decisions on infrastructure investments.
You'll enjoy this role if you are…
- A technical leader who thrives on building and coaching high-performing, mission-driven engineering teams
- Passionate about reliability, observability, and operational rigor in complex, distributed environments
- Driven by impact and pragmatism, with a bias toward data-informed decisions and continuous improvement
- Comfortable navigating failures and ambiguity, and experienced in leading high-stakes incident response
- Motivated to work across infrastructure domains (SRE, DBRE, Cloud) and evolve platform maturity
- Energized by building reliable platforms as internal products and enabling others to move faster with confidence
- A strong advocate for inclusive, empowering team cultures that foster growth and accountability
- Comfortable working autonomously and collaboratively in a distributed and flexible work environment
- Excited to grow as a leader, working closely with senior infrastructure leadership to expand your influence
Your day to day is…
- Leading and mentoring a multi-disciplinary team of SREs, DBREs, and Cloud Engineers through technical execution and career development
- Defining and driving initiatives that improve reliability, resilience, observability, and performance across our platform
- Partnering with Engineering and Product leaders to ensure systems are safe by default and operationally sustainable
- Driving incident management practices-from fast response to meaningful retrospectives that lead to real change
- Creating and evolving metrics-driven processes to track team, service, and project health
- Owning roadmaps that link infrastructure investments to business value, including cost optimization and tooling strategy
- Managing cloud tooling and vendor decisions to maximize ROI and technical leverage
- Supporting hybrid architectural strategies, improving the reliability of both our monolith and cloud-native services
- Acting as a strategic resource for cloud operations, shared platforms, and system-wide reliability architecture
- Championing technical culture-career ladders, hiring standards, and cross-team initiatives that grow a thriving org
You'll be successful in this role if you have…
- 10+ years of experience in Site Reliability, Database Reliability, or closely related fields (e.g., DevOps, Cloud, Platform, Backend)
- 5+ years of experience leading technical teams, including hiring, performance management, and org building
- A track record of managing mission-critical production systems, balancing velocity with safety
- Experience with AWS cloud services, distributed systems, observability stacks, and architectural patterns for scale
- Strong instincts around incident response, blameless retrospectives, and risk mitigation
- The ability to influence decisions, challenge assumptions, and guide teams through trade-offs
- Exceptional communication skills that span technical depth and business clarity
- A passion for mentoring engineers, growing leadership skills, and building resilient engineering cultures
- Familiarity with managing internal platforms as products, and partnering cross-functionally to maximize adoption and impact
Benefits:
- Flexible, hybrid work environment at our convenient Soho location
- Unlimited Vacation
- 100% paid employee health benefit options (including medical, dental, and vision)
- Commuter Benefits
- 401(k) with employer funded match
- Corporate wellness programs with Headspace and Peloton
- Sabbatical leave (for employees with 5+ years of service)
- Competitive paid parental leave and fertility/family planning reimbursement
- Cell phone reimbursement
- Catered lunch everyday along with beverages and snacks
- Employee Resource Groups and ZocClubs to promote shared community and belonging
- Great Place to Work Certified
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.
