Lambda Logo

Lambda

Senior Site Reliability Engineer - Managed Kubernetes

Posted Yesterday
Be an Early Applicant
In-Office
San Francisco, CA
267K-401K Annually
Senior level
In-Office
San Francisco, CA
267K-401K Annually
Senior level
Manage and maintain Kubernetes clusters, automate lifecycle management, and assist customers with issues while ensuring platform reliability.
The summary above was generated by AI

We're here to help the smartest minds on the planet build Superintelligence. The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with them, and we move fast to keep up. If you want to work on massive, world-changing AI deployments with people who love action and hard problems, we're the place to be.


If you'd like to build the world's best deep learning cloud, join us. 

What You’ll Do

  • Operate and maintain bare-metal Kubernetes clusters, scaling up to thousands of nodes

  • Handle cluster degradation, recovery, resizing, and incident response using fleet management tools

  • Participate in a well-managed on-call rotation for critical incidents

  • Assist customers with Kubernetes questions, workload integration, storage, and authentication

  • Work closely with our HPC Ops and Datacenter Ops teams for low-level or cross-functional issues

  • Use Python and Golang to create tooling and automate the validation of platform quality.

  • Design, build, and maintain scalable control plane services, operators, and custom controllers for Kubernetes

  • Develop automation for cluster lifecycle management: provisioning, upgrades, patching, and deletion.

  • Define and implement SLOs and SLIs for Kubernetes services, workloads, and platform reliability.

About You

Must-Have

  • 6+ years of experience in a SRE, operations engineer, or similar role, with a deep knowledge of running Linux clusters and systems

  • Strong programming skills in Go and Python; experience with GitOps (e.g., ArgoCD), Helm, and Kubernetes operators

  • Proven experience operating Kubernetes clusters in production environments (on-prem, EKS, GKE, or similar)

  • Can work either independently with limited direction or as part of a team

  • Can work with customers during incidents either via tickets, live messaging, or as part of a larger call.

  • Familiarity with observability tools like Prometheus, Grafana, FluentBit, and CI/CD pipelines

  • Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar

Nice-to-Have

  • Deep Kubernetes expertise: CRDs, CSI, CNI, Kubernetes Operator Coding experience

  • Exposure to HPC clusters, AI/ML workloads, or large-scale GPU clusters

  • Hybrid or multi-cloud Kubernetes environment experience

  • Contributions to CNCF projects or Kubernetes SIGs

Why Join Us

  • Work on cutting-edge Managed Kubernetes platforms for AI/ML workloads

  • Influence the platform roadmap and help shape operations and reliability best practices

  • Collaborate with a highly skilled engineer

  • Opportunity to mentor and grow within a fast-growing, technology-driven environment

About Lambda

  • Founded in 2012, ~400 employees (2025) and growing fast

  • We offer generous cash & equity compensation

  • Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Health, dental, and vision coverage for you and your dependents

  • Wellness and Commuter stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Top Skills

Argocd
Fluentbit
Gitops
Go
Grafana
Helm
Kubernetes
Prometheus
Python

Similar Jobs

22 Seconds Ago
In-Office
Irvine, CA, USA
120K-175K Annually
Senior level
120K-175K Annually
Senior level
Aerospace • Artificial Intelligence • Hardware • Information Technology • Software • Defense • Manufacturing
Design and build spacecraft components, lead engineering processes, and mentor engineers while ensuring reliability and adherence to space designs.
Top Skills: 3D Cad SoftwareSolidworks
2 Minutes Ago
Remote or Hybrid
2 Locations
176K-220K
Senior level
176K-220K
Senior level
Artificial Intelligence • Big Data • Software • Analytics • Business Intelligence • Big Data Analytics
The role involves building secure enterprise solutions, implementing access control systems, and developing public APIs while ensuring intuitive user experiences and scalable architecture.
Top Skills: MfaOauthPublic ApisScimSso
3 Minutes Ago
Easy Apply
Remote or Hybrid
6 Locations
Easy Apply
172K-253K
Senior level
172K-253K
Senior level
Fintech • HR Tech
Design seamless payment experiences, collaborate across teams to integrate solutions, and drive product strategy while ensuring security against fraud.
Top Skills: Collaborative ToolsDesign SystemsProduct DesignUser ResearchUx Design

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account