
(USA) Principal, Software Engineer
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.1
Reclaim your time by letting our AI handle the grunt work of job searching.
We continuously scan millions of openings to find your top matches.

Overview
Schedule
Full-time
Part-time
Career level
Senior-level
Remote
On-site
Compensation
$143,000-$286,000/year
Benefits
Health Insurance
Dental Insurance
Vision Insurance
Job Description
Position Summary...
Walmart processes more transactions in a day than most companies handle in a year. When performance degrades or systems fail, the impact is immediate — measured in millions of dollars and hundreds of millions of customers. We're building the team that prevents that using agentic AI.As a Principal Engineer in Performance and Resiliency Engineering, you'll architect and lead the development of intelligent, self-healing systems: LLM-based agents that detect anomalies, reason across observability data, and trigger automated remediation — without waiting for a human in the loop. You'll operate at a scale most AI engineers never encounter: 10,500 stores, 240M weekly customers, and infrastructure that powers one of the world's largest retail ecosystems.This isn't a research role or a proof-of-concept environment. You'll own the technical strategy, set architectural direction, and ship to production — building agentic systems that directly impact Walmart's global reliability and business continuity.About the TeamBuilding the right technology foundation for Infrastructure & Platforms is vital to success at Walmart's scale. Our team builds and maintains the foundational technologies that power the entire tech organization — data platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. We ship to production weekly, run blameless postmortems, and treat chaos experiments as first-class engineering work. If you thrive in high-ownership environments where your architectural decisions have immediate, measurable impact, this is where you belong.What you'll do...
What You'll Own You'll set the technical direction — not just execute it. From initial architecture through production deployment, you'll own the roadmap for Walmart's agentic AI platform for performance and resiliency. You'll have the autonomy to make architectural tradeoffs, drive experimentation, and shape how intelligent systems operate at enterprise scale. Key ResponsibilitiesBuild & Lead Agentic AI Systems- Architect production multi-agent pipelines — from RAG-based knowledge grounding to LLM-driven decision-making and autonomous remediation — operating across 10,500 stores and 240M weekly customers
- Own LLM evaluation standards for production: factuality, consistency, safety guardrails, and failure modes; set the bar that other teams adopt
- Optimize LLM inference at scale through prompt caching, quantization, and retrieval filtering — measurable latency and cost impact, not theoretical gains
- Integrate vector databases and observability stacks to build context-aware systems that act on live signals without human intervention
- Build the AI/ML layer that moves Walmart from reactive incident response to predictive, self-correcting infrastructure — cutting mean time to recovery across critical systems
- Design and run chaos experiments that expose real failure modes and change architecture decisions — not checkbox exercises
- Define SLOs that reflect real business impact, integrate performance gates into CI/CD, and make observability (Grafana, Prometheus, ELK, Splunk) actionable across the org
- Write and maintain runbooks that teams actually use: tested, updated after every incident, and clear enough to act on under pressure
- Set the architectural direction for the org's agentic AI platform — from initial design through production deployment — and own the decisions that follow
- Close the gap between experimentation and production: move ML models from notebooks into reliable, monitored systems that hold up under Black Friday-scale traffic
- Raise the technical floor through design reviews and mentoring that produces engineers who make better decisions independently
- Shape the multi-year roadmap for AI-powered performance and resiliency, influencing infrastructure investment decisions across the org
- 10+ years of experience building and operating distributed systems at scale
- Proven, hands-on production experience with LLMs, agentic frameworks, or RAG-based systems
- Deep background in performance engineering, chaos engineering, or SRE — with real ownership of SLOs and incident response
- Strong programming skills in Python and/or Java; comfort working across the full ML stack
- Familiarity with ML frameworks: PyTorch, TensorFlow, Hugging Face Transformers
- Hands-on with cloud-native infrastructure: GCP, Azure, Kubernetes, Docker
- MLOps experience: CI/CD for ML, drift detection, model monitoring
- Experimentation background: A/B testing, causal inference, multi-armed bandits
ㅤ
ㅤ
ㅤ
ㅤ
Minimum Qualifications...
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 5 years’ experience in software engineering or related area.Option 2: 7 years’ experience in software engineering or related area.Preferred Qualifications...
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Master’s degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years' experience in software engineering or related area., We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.Primary Location...
1345 Crossman Ave, Sunnyvale, CA 94089-1114, United States of AmericaWalmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment.Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.

FAQs About (USA) Principal, Software Engineer Jobs at Walmart
What is the work location for this position at Walmart?
This job at Walmart is located in Sunnyvale, California, according to the details provided by the employer. Some roles may also include multiple work locations depending on the requirement.
What pay range can candidates expect for this role at Walmart?
Candidates can expect a pay range of $143,000 and $286,000 per year.
What employment applies to this position at Walmart?
Walmart lists this position under the following employment categories:
- Full-time
- Part-time
What experience level is required for this role at Walmart?
Walmart is looking for a candidate with "Senior-level" experience level.
What benefits are offered by Walmart for this role?
Walmart offers following benefits: Health Insurance, Dental Insurance, Vision Insurance, Disability Insurance, Life Insurance, Paid Vacation, Paid Sick Leave, Parental and Family Leave, 401k Matching/Retirement Savings, Tuition/Education Assistance, and Health & Wellness Programs for this position. Actual benefits may vary depending on the employer's policies and employment terms.
What is the process to apply for this position at Walmart?
You can apply for this role at Walmart either through Sonara's automated application system, which helps you submit applications 10X faster with minimal effort, or by applying manually using the direct link on the job page.