landing_page-logo
HJ Staffing logo

Python Data Engineer (Remote)

HJ StaffingSan Jose, CA
Apply

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.1

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

pay-wall

Job Description

We are seeking a highly skilled Python Data Engineer with deep experience in CMS datasets (MOR, MMR, MAO) and a strong understanding of healthcare regulations and compliance standards (HIPAA). This role is ideal for a data-driven professional who thrives in cloud-native environments and is passionate about building robust, scalable, and efficient pipelines that drive healthcare innovation.

Key Responsibilities:

  • Design, develop, and maintain scalable ETL pipelines for CMS datasets using GCP Dataflow (Apache Beam) and Python

  • Architect and manage BigQuery data warehouses, ensuring optimal performance and cost-efficiency

  • Implement and manage Airflow DAGs for workflow orchestration and scheduling

  • Ensure end-to-end data quality, lineage, validation, and governance in alignment with HIPAA and CMS standards

  • Optimize large-scale healthcare datasets using partitioning, clustering, sharding, and efficient query patterns in BigQuery

  • Collaborate within Agile teams using tools like Jira and Confluence for sprint planning and documentation

  • Monitor, troubleshoot, and improve pipeline reliability and performance across the full data lifecycle

Qualifications:

  • Bachelor's degree in Computer Science, Information Systems, or related field

  • 3+ years of experience in cloud-based data engineering, preferably with healthcare datasets

  • Strong proficiency in Python, GCP Dataflow, and Apache Beam

  • Expert-level knowledge in BigQuery, including schema design, performance tuning, and advanced SQL

  • Hands-on experience with Airflow forthe  orchestration of complex data workflows

  • In-depth understanding of data warehouse design, including star/snowflake schemas, normalization, and denormalization

  • Strong analytical skills for query and data optimization

  • Familiarity with Agile methodologies and collaboration tools (Jira, Confluence)

  • Knowledge of CMS datasets (MOR, MMR, MAO) and healthcare data privacy/compliance standards (HIPAA)