ECLARO logo

Site Reliability Engineering (Sre) Platform Engineer (Lead)

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.1

Reclaim your time by letting our AI handle the grunt work of job searching.

We continuously scan millions of openings to find your top matches.

pay-wall

Job Description

Site Reliability Engineering (SRE) Platform Engineer (Lead)Job Number: 26-00672
Use your skills where innovative technology solutions begin. ECLARO is looking for a Site Reliability Engineering (SRE) Platform Engineer (Lead) for our client in Rochester, NY. ECLARO’s client is a leading technology solutions provider, collaborating with customers to manage their needs and achieve success in their business goals. If you’re up to the challenge, then take a chance at this rewarding opportunity!Position Overview:
  • As a Lead SRE Platform Engineer, will drive reliability engineering strategy and execution across critical IT Business Solutions platforms This role focuses on improving uptime, performance, and operational efficiency through software enhancements, observability, automation, and data-driven Root Cause Analysis (RCA).
  • Will serve as the technical lead for SRE practices—establishing monitoring standards, improving MELT (Metrics, Events, Logs, Traces) strategy, influencing tooling decisions, and partnering across infrastructure, development, operations, and vendor teams. This is a high-impact opportunity to build and mature reliability engineering capabilities from the ground up.
Responsibilities:
  • Reliability & Observability Leadership:
    • Define and mature SRE best practices across cloud and on-prem environments.
    • Design and implement comprehensive monitoring strategies using tools such as: Dynatrace, Datadog, Microsoft SCOM.
    • Develop dashboards, alerts, synthetic testing, and proactive monitoring capabilities.
    • Establish and evolve a MELT data strategy to improve service reliability.
    • Provide data-driven RCA investigations and implement preventative solutions.
  • Platform & Application Reliability:
    • Support and enhance reliability across:
      • Cloud & Infrastructure:
        • Microsoft Azure (Software, Storage, Azure Local)
        • Hyper-V and Legacy VMware Environments
        • NetApp and Pure Storage Platforms
        • Azure Log Analytics
        • Infrastructure as Code using Terraform
        • Migration from Azure DevOps to GitHub (strong GitHub experience, required)
      • Order Management Systems:
        • Azure-based, internally developed .NET / C# applications.
        • Internal message queuing systems.
        • Logging, analytics, and synthetic testing post-patching.
        • API-based integrations.
      • Workforce & Payroll Platforms:
        • Workday (Payroll)
        • ADP Vantage (Timekeeping)
      • Warehouse & Distribution Systems:
        • Blue Yonder Warehouse Management System (WMS)
        • Collect handheld voice picking devices.
        • Network analytics for identifying dead zones and connectivity issues.
        • Barcode scanners and device connectivity troubleshooting.
  • DevSecOps & Automation:
    • Lead CI / CD reliability improvements (Azure DevOps → GitHub transition critical).
    • Enhance pipeline automation with embedded security controls.
    • Advance Infrastructure-as-Code standards (Terraform).
    • Improve configuration management and change governance.
    • Drive automation to reduce manual intervention and operational risk.
  • ITSM & Incident Management:
    • Work within BMC ecosystem including:
      • BMC Helix
      • BMC Remedy
      • BMC Server Automation
    • Optimize automated incident generation (SCOM → BMC workflows).
    • Improve triage, escalation, and impact modeling across services.
    • Monitor vendor performance and escalate appropriately.
    • Participate in off-hour escalation support when required.
  • Strategic Impact:
    • Develop predictive reliability models using statistical techniques.
    • Identify systemic risk across production systems.
    • Guide tooling decisions (e.g., Dynatrace vs. Datadog or other observability platforms).
    • Ensure regulatory and operational compliance standards are met.
    • Facilitate cross-functional collaboration and document SRE procedures and planning artifacts.
Required Skills:
  • 5-7+ years of Software Engineering and Infrastructure / Database Engineering experience.
  • Deep expertise in:
    • DevSecOps practices
    • Observability Platforms
    • API Integrations
    • Performance Management Tools
    • ITIL Principles
    • ITSM Data Analytics
    • MELT Data Collection and Analysis
  • Experience in Azure cloud environments.
  • Strong analytical and problem-solving skills.
  • Demonstrated ability to influence technical direction.
  • Excellent communication and cross-team collaboration skills.
  • Continuous improvement mindset focused on reliability engineering.
Preferred Qualifications:
  • Strong programming experience in:
    • .NET / C#
    • Python
    • SQL
  • Experience with MSSQL (primary) and Oracle (limited).
  • Experience with GitHub (critical for upcoming transition).
  • Agile / Scrum experience.
  • Knowledge of Reliability-Centered Engineering and maintenance strategies.
  • Experience with synthetic testing and proactive validation post-deployment.
  • Bachelor's Degree in a related technical field.
If hired, you will enjoy the following ECLARO Benefits:
  • 401k Retirement Savings Plan administered by Merrill Lynch
  • Commuter Check Pretax Commuter Benefits
  • Eligibility to purchase Medical, Dental & Vision Insurance through ECLARO
If interested, you may contact:Jeanine Hastingsjeanine.hastings@eclaro.com646-755-9303Jeanine Hastings | LinkedInEqual Opportunity Employer: ECLARO values diversity and does not discriminate based on Race, Color, Religion, Sex, Sexual Orientation, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status, in compliance with all applicable laws.

Automate your job search with Sonara.

Submit 10x as many applications with less effort than one manual application.

pay-wall

FAQs About Site Reliability Engineering (Sre) Platform Engineer (Lead) Jobs at ECLARO

What is the work location for this position at ECLARO?
This job at ECLARO is located in Rochester, NY, according to the details provided by the employer. Some roles may also include multiple work locations depending on the requirement.
What pay range can candidates expect for this role at ECLARO?
Employer has not shared pay details for this role.
What employment applies to this position at ECLARO?
The employer has not provided this information. This may be discussed during the hiring process.
What is the process to apply for this position at ECLARO?
You can apply for this role at ECLARO either through Sonara's automated application system, which helps you submit applications 10X faster with minimal effort, or by applying manually using the direct link on the job page.