landing_page-logo
DataBricks logo

Staff Software Engineer - Production Engineering

DataBricksSan Francisco, CA
Apply

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.1

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

pay-wall

Job Description

RDQ426R178

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems - from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers - and customer obsessed - we leap at every opportunity to tackle technical challenges, from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And we're only getting started.

As a production engineer with a backend focus, you will ensure stable and efficient operation of production environments of your service by proactively monitoring systems, automating routine tasks, optimizing performance, responding to incidents, and managing deployment pipelines. This implies, among others, to write software in Scala/Java and to work closely with other engineering teams to maintain high availability and ensure the integrity and security of live systems.

The impact you will have:

  • Improved System Reliability and Availability: By proactively monitoring and resolving issues across distributed systems, you will significantly reduce downtime and improve SLAs, directly contributing to a more resilient production environment.
  • Enhanced Operational Efficiency: Through automation of routine operational tasks and deployment processes, you will streamline engineering workflows, reducing manual toil and accelerating release cycles across global infrastructure.
  • Performance Optimization at Scale: By identifying and addressing performance bottlenecks in backend services and infrastructure, you will improve resource utilization and system throughput, enabling cost-effective scaling across thousands of Kubernetes clusters and millions of VMs.
  • Strengthened System Security and Integrity: By embedding security best practices into the deployment and operational workflows, you will help ensure compliance and protect production environments against vulnerabilities and threats.

What we look for:

  • BS/MS/PhD in Computer Science, or a related field
  • 10+ years of production level experience in one of: Java, Scala, C++, or similar language.
  • Comfortable working towards a multi-year vision with incremental deliverables.
  • Experience in architecting, deploying and operating large scale distributed systems with high availability, scalability and durability.
  • Experience in performance and cost optimization, disaster recovery mechanisms, incident management and troubleshooting.
  • Good knowledge of SQL and operational experience in distributed and single node database engines.
  • Experience with software security and systems that handle sensitive data.
  • Experience with cloud technologies, e.g. AWS, Azure, GCP, Docker, Kubernetes.