NVIDIA Logo

NVIDIA

Senior System Engineer – DGX Cloud Lepton

Posted 20 Hours Ago
Be an Early Applicant
In-Office or Remote
2 Locations
184K-357K
Senior level
In-Office or Remote
2 Locations
184K-357K
Senior level
As a Senior System Engineer, you'll ensure platform reliability and security for NVIDIA's DGX Cloud Lepton, focusing on infrastructure efficiency for AI workloads.
The summary above was generated by AI

Joining NVIDIA's DGX Cloud Lepton Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data infrastructure tools and services. Our objective is to deliver a stable, scalable environment for AI researchers, providing them with the necessary resources and scale to foster innovation. DGX Lepton delivers NVIDIA-managed GPU/Kubernetes capacity for AI workloads. 

As a Senior System Engineer, you’ll own Lepton platform’s reliability and ensure security is a first-class part of day-to-day operations. You’ll have the autonomy to drive meaningful projects with strong mentorship and support. We practice blameless postmortems, iterate continuously, and encourage thoughtful risk-taking. If you’re looking for an impactful, rewarding role, we invite you to apply.
 

What you’ll be doing:

  • Platform fundamentals: design, build, and operate core services and node/cluster foundations for Lepton platform; automate deployments, upgrades, and day-2 operations.

  • Vulnerability & patch management: own intake, prioritization, rollout, and rollback rhythms across OS, drivers/firmware, and platform components for Lepton product.

  • Security as a product quality: define, deliver, and maintain secure-by-default baselines (host hardening, workload isolation, network segmentation, least-privilege access) for AI infrastructure at scale.

  • Identity & access stewardship: standardize patterns for service identity, role scoping, secrets handling, and certificate hygiene.

  • Trusted releases: drive change control and release practices that ensure traceability and integrity of what runs in production.

  • Monitoring & incident practice: establish health signals and SLOs; lead investigations, root causes, and follow-through actions that improve both reliability and security.

  • Risk & readiness: partner with product, SRE, and security stakeholders to assess risks for new features and close gaps with pragmatic controls.

  • Documentation & mentorship: publish runbooks and standards; review designs and coach engineers on secure operational practices.

What we need to see:

  • 7+ years in systems/platform engineering operating large-scale, production environments.

  • Demonstrated ability to deliver secure, reliable platforms (hardening, access control, isolation, monitoring, and strong operational runbooks).

  • Experience with containerized/managed cluster environments; familiarity with GPU-accelerated platforms or the ability to ramp quickly.

  • Automation mindset with infrastructure-as-code and CI/CD; disciplined change management.

  • Clear communication and documentation skills; ability to turn requirements into practical, supportable designs.

  • Bachelor's degree or higher in Computer Science or a related technical field (or equivalent experience).

Ways to stand out from the crowd:

  • Hands-on engineering experience of delivering and driving platform security baselines in multi-tenant environments.

  • Production Kubernetes experience (EKS/AKS/GKE) at fundamental level, especially private clusters and PSA restricted defaults.

  • Supply-chain basics at scale: signed images (cosign) enforced via policy-as-code (Kyverno/OPA).

  • Familiarity with NVIDIA GPU platforms (GPU Operator/device plugin, MIG-aware operations)

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until August 19, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

AI
Ci/Cd
Gpu
Infrastructure As Code
Kubernetes

Similar Jobs

2 Hours Ago
Remote or Hybrid
United States
17-20
Entry level
17-20
Entry level
Digital Media • eCommerce • Information Technology • Marketing Tech • Retail • Social Media • Analytics
The Visual Design Intern supports the design team by adapting templates for various media, maintaining asset organization, and assisting with design execution while developing essential skills under guidance.
Top Skills: Adobe Creative SuiteFigmaIllustratorIndesignPhotoshop
2 Hours Ago
Remote or Hybrid
2 Locations
205K-258K Annually
Senior level
205K-258K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead Sales Operations to optimize processes, tools, and policies; support B2B software growth and enhance sales productivity through data-driven strategies.
Top Skills: RevtechSales Analytics ToolsSalesforce
6 Hours Ago
Remote or Hybrid
United States
110K-138K
Senior level
110K-138K
Senior level
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Manage a content team, oversee content strategy, optimize for SEO, analyze performance metrics, and guide brand voice for DraftKings Network.
Top Skills: Ai Content GenerationSeo

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account