NVIDIA Logo

NVIDIA

Manager, Software Engineering - Production AI Inference

Posted 3 Days Ago
Be an Early Applicant
In-Office
Santa Clara, CA
224K-431K Annually
Expert/Leader
In-Office
Santa Clara, CA
224K-431K Annually
Expert/Leader
Lead a hands-on engineering team to ship production-ready LLM inference via NVIDIA Inference Microservices. Own model onboarding, serving integration, performance optimization, release quality, security readiness, automation, observability, and operational health while partnering with product, research, security, and ops.
The summary above was generated by AI

NVIDIA is the platform upon which every new AI-powered application is built. We are seeking a deeply technical software manager to lead production AI inference for NVIDIA Inference Microservices (NIM), the production runtime through which customers deploy optimized, enterprise-supported AI inference across cloud, data center, and edge environments. NIM makes state-of-the-art AI models available as production-ready software stack, combining optimized inference engines, model profiles/recipes, validated runtime configurations, and security hardening. This role leads the team accountable for turning fast-moving model and inference engine work into reliable NIM releases that customers can operate with confidence.

This is a hands-on engineering management role for someone who can run production execution without managing from a distance. You will lead engineers working across model onboarding, serving stack integration, performance profiling/optimization, release quality, security readiness, automation, observability, and operational health. You will partner closely with the product, solution architect, security, research, and other internal engineering teams to make day-0 model launches repeatable and to raise the production bar for every NIM release.

What you'll be doing:

  • Lead the team responsible for shipping production-ready LLM NIMs, including planning, new model onboarding, validated serving recipes, release readiness, and post-release follow-through.

  • Build a predictable operating model for the team through roadmap planning, a weekly execution rhythm, launch checklists, clear ownership boundaries, collaborator communication, and issue management.

  • Own project execution by anticipating schedule, staffing, and dependency risks. Adapt plans under pressure and collaborate with peer managers to dynamically prioritize engineering timelines to remain agile in the fast paced AI industry.

  • Drive continuous improvement in production workflows through RCCA and partner feedback, removing unnecessary and redundant work while keeping the team passionate about production outcomes.

  • Build and maintain a world-class AI inference engineering team by building an innovative culture, setting clear expectations, maintaining active feedback loops, and mentoring engineers and emerging leaders.

What we need to see:

  • 10+ overall years building production software, including 3+ years of managing software engineering teams.

  • Experience delivering production software with strong quality, reliability, and release expectations.

  • Experience driving process improvements, and improving operational efficiency.

  • Excellent communication and collaborator management; ability to influence executive leadership across product, research, security, and operations.

  • Deep understanding of AI/ML fundamentals, innovative model architectures, inference engine/kernel, performance optimization strategies, accelerated computing, large-scale distributed systems, and security hardening.

  • A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.

Ways to stand out from the crowd:

  • Built and managed globally distributed organizations; established durable engineering processes that significantly improved quality and velocity across multiple teams.

  • Recognized industry leader with contributions to open-source ecosystems (i.e vLLM, SGLang, TensorRTLLM, Dynamo, Triton, PyTorch), technical publications, or talks in containers, Kubernetes, GPU, or inference communities.

  • Drove measurable performance improvements for large-scale LLM inference systems, including latency, throughput, GPU utilization, cost efficiency, and performance regression prevention across production releases.

  • Hands-on experience with core GPU technologies such as CUDA, cuDNN, CUTLASS, cuBLAS, NCCL, NIXL, NVLink, and GPUDirect RDMA.

  • Hands-on experience delivering enterprise or government-ready AI software, including FedRAMP, air-gapped deployments, regulated environments, security hardening, compliance evidence, and production support expectations.

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most hard-working and talented people in the world working for us. If you're creative and passionate about developing cloud services we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD for Level 3, and 272,000 USD - 431,250 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until July 6, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

An Hour Ago
In-Office
Seal Beach, CA, USA
127K-228K Annually
Senior level
127K-228K Annually
Senior level
Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Support 737NG/MAX propulsion service engineering by investigating and resolving in-service propulsion issues, communicating with airlines and suppliers, developing mitigation plans, updating maintenance documentation, leading fleet-wide projects, and providing on-site technical assistance as needed.
Top Skills: Aircraft Maintenance ManualBoeing Communication SystemBoeing Drawing SystemsFault Isolation ManualMy Boeing Fleet
An Hour Ago
Hybrid
38K-67K Hourly
Senior level
38K-67K Hourly
Senior level
Fintech • Financial Services
Lead and coach a branch team to acquire, deepen, and retain customer relationships across checking, deposits, lending, cards, and investments. Drive sales, operational excellence, risk management, and cross-channel collaboration while completing the Branch Manager Readiness Program and meeting performance targets.
An Hour Ago
Hybrid
38K-67K Hourly
Senior level
38K-67K Hourly
Senior level
Fintech • Financial Services
Lead a branch sales team to acquire, deepen, and retain customer relationships across deposits, lending, cards, and investments. Coach and develop bankers, drive performance through disciplined execution and reporting, partner with internal sales teams, promote digital banking, and balance growth with operational risk management and compliance. Complete required Branch Manager Readiness Program and maintain required registrations.

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account