Cloud Data Engineer

Space Telescope Science InstituteBaltimore, MD

$115,000 - $150,000 / year

Apply with Sonara

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.¹

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

Job Description

The Space Telescope Science Institute (STScI), a key NASA science and flight operations center, is hiring a Cloud Data Engineer to join our Catalog Science Branch. We are looking for a talented and experienced professional to help manage the large-scale data infrastructure and processing for our advanced astronomical public data archive, the Mikulski Archive for Space Telescopes (MAST), serving missions such as HST, JWST, Roman, Kepler, and TESS (Click here to learn more about our missions). The ideal candidate will be responsible for designing, building, and optimizing the scientific data solution and data pipeline architectures

This position can support hybrid work and is currently work from home with onsite as required occasionally . Candidates must reside in or be willing to relocate to our local market. (MD, DE, VA, PA, DC & WV).

The posted salary range represents a general guideline; however, STScI considers a number of factors when determining salary offers, such as internal pay equity, the scope and responsibilities of the position, the candidate's experience, education, and skill, and current market conditions.

The annual salary range for a Senior Cloud Data Cloud Engineer role (8+ years industrial working experience) is $130,000 - $150,000.
The annual salary range for a Cloud Data Cloud Engineer II role (5+ years industrial working experience) is $115,000- $125,000.

This position requires US Citizenship or Permanent Residence in order to meet ITAR requirements.

Cloud Data Engineer Key Responsibilities:

Design, build, deploy, test, and maintain highly scalable cloud-based data management systems capable of handling petabyte-scale astronomical datasets in hybrid legacy-modern environments.
Architect and maintain robust, fault-tolerant data pipelines using Apache Airflow (or equivalent orchestration tools) to support rapidly growing mission data volume and complexity.
Deploy and manage containerized workloads at scale using Kubernetes, AWS EKS, and AWS ECS.
Lead scalable database management, with deep expertise in query performance tuning and optimization of PostgreSQL/Greenplum, and other large-scale analytical databases.
Collaborate closely with archive scientists, data engineers, application engineers, storage, and networking teams on system architecture, implementation strategy, and automation scripting.
Build, automate, and maintain CI/CD pipelines and Infrastructure-as-Code (Terraform, CloudFormation, etc.) to enable rapid, repeatable, and reliable deployment of data infrastructure.

Experience, Skills and Qualifications:

Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.

Senior level with a minimum of 8 years industrial working experience and Level II with a minimum of 5 years industrial working experience in an AWS Cloud environment and experience in data engineering, data management with a focus on cloud-based solutions. The substitution of additional relevant education and/or experience for stated qualifications may be considered.

Strong proficiency in PostgreSQL, including advanced schema design, query optimization, indexing strategies, partitioning, and large-scale performance tuning (vacuuming, analyzing, query plan analysis
Expertise in Apache Airflow for designing, implementing, and maintaining complex production DAGs, task scheduling, monitoring, and troubleshooting at scale
Production-grade experience with Kubernetes for container orchestration; hands-on deployment and management of workloads on AWS EKS and AWS ECS
Production experience with Kubernetes for container orchestration; AWS ECS and AWS EKS for running and scaling data services.
Deep experience designing and implementing scalable ETL/ELT pipelines, cloud-native data lakes, and modern lake house architectures (e.g., Iceberg).
Strong hands-on experience with core AWS data services: S3 (lifecycle policies, versioning, encryption), Lambda, Step Functions, IAM (least-privilege policies) and CloudWatch
Strong programming skills in Python (mandatory) and familiarity with Bash/shell scripting
Excellent problem-solving skills and proven ability to communicate complex technical concepts to both engineering and scientific stakeholders.

Preferred/Nice to Have Qualifications:

Hands-on experience with Greenplum for massive parallel processing (MPP) and large-scale analytical workloads
Working knowledge of Apache Iceberg for table format management in data lakes
Experience with Trino (formerly PrestoSQL) for distributed SQL execution across heterogeneous data sources
Familiarity with big data ecosystems (e.g., Parquet, Spark, ORC)
Background in machine learning pipelines, advanced analytics or database architecture

We offer an excellent and generous benefits package, tuition reimbursement, flexible work schedules and a stimulating and diverse work environment. Explore our benefits: http://www.stsci.edu/opportunities/benefits

TO APPLY: Share your experience by uploading a resume and completing an online application. Applications received by December 31, 2025, will receive full consideration. Applications received after this date will be considered until the position is filled.

Individuals needing assistance with the employment process can contact us at careers@stsci.edu.

#LIHYBRID

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

Apply with Sonara Apply manually