Zettabyte Inc

Senior/Staff Backend Engineer - Distributed System

Reposted Yesterday

Remote

Hiring Remotely in United States

Senior level

Remote

Hiring Remotely in United States

Senior level

Design APIs, build scheduling algorithms for GPU utilization, create resource management systems, and collaborate with frontend engineers for AI infrastructure.

The summary above was generated by AI

About Us

At Zettabyte, we’re on a mission to make AI compute ubiquitous, seamless, and limitless. We’re building a cloud where AI just works—anywhere, anytime. “AI Power. Everywhere.” Be part of the team designing the infrastructure for the AI-first world.

Why this role exists

We need a Backend Engineer to build the systems that orchestrate GPU clusters for AI workloads. You'll create APIs that handle GPU allocation, memory management, compute scheduling, and multi-tenant isolation—challenges unique to AI infrastructure that go far beyond typical backend engineering. As part of our backend team, you'll solve problems like: How do we efficiently share expensive GPU resources across users? How do we handle GPU memory constraints for large AI models? How do we ensure quality of service when workloads compete for compute? This is an opportunity to build infrastructure where every API call could allocate thousands of dollars worth of compute per hour, where your optimizations directly impact whether AI startups can afford to train their models.

What you’ll do

Design APIs that abstract complex GPU operations into simple developer experiences
Build scheduling algorithms that maximize GPU utilization while ensuring SLA compliance
Develop resource management systems for GPU lifecycle—provisioning, allocation, scheduling, and release
Create usage tracking and billing systems for GPU-hours, memory usage, and compute utilization
Implement monitoring for GPU-specific metrics, health checks, and automatic failure recovery
Build multi-tenancy systems with resource isolation, quota management, and fair scheduling
Optimize cold starts for model serving and implement efficient model loading strategies
Collaborate with frontend engineers to expose complex infrastructure through intuitive interfaces
Leverage AI-assisted coding tools (GitHub Copilot, Claude Code, Cursor IDE, etc.) to boost productivity and code quality.

You’ll thrive here if you

5+ years backend engineering experience with distributed systems
Strong proficiency in Go, Python, or similar backend languages
Experience with resource scheduling, orchestration, and API design (REST, GraphQL, gRPC)
Understanding of hardware constraints and system optimization
Linux systems knowledge and containerization experience (Docker, Kubernetes)
Comfortable working with expensive resources where efficiency directly impacts costs
Excited about solving novel problems in AI infrastructure (not just another CRUD app)
Startup mindset—comfortable with ambiguity and rapid iteration

Bonus qualifications

GPU or HPC cluster management experience
Understanding of ML/AI workload patterns and requirements
Experience with high-value resource allocation systems
Background in performance optimization for compute-intensive workloads
Familiarity with GPU virtualization and sharing technologies
Experience building billing or metering systems

Details

We provide Competitive salary and equity based on your experience and skillset;
This is a Hybrid role - 3 days in office, 2 days WFH; Must locate in Palo Alto
Applicants must be authorized to work in the United States without need for visa sponsorship.

Top Skills

Docker

GraphQL

Grpc

Kubernetes

Python

Rest

Similar Jobs

Atlassian

Senior User Researcher

7 Hours Ago

In-Office or Remote

San Francisco, CA, USA

130K-203K Annually

Senior level

130K-203K Annually

Senior level

Cloud • Information Technology • Productivity • Security • Software • App development • Automation

The Senior User Researcher will conduct user research to enhance product design and usability, contributing insights to drive product improvements.

Starburst

University Talent Network

9 Hours Ago

Easy Apply

Remote

United States

Easy Apply

Internship

Big Data • Cloud • Information Technology • Software • Database • Analytics • Big Data Analytics

Join Starburst's Talent Network to kickstart your career as a college student or upcoming graduate. Gain opportunities for growth in a supportive environment and be part of a diverse culture.

Sysco LABS

Mobile Quality Engineer

9 Hours Ago

Remote

United States

95K-142K Annually

Junior

95K-142K Annually

Junior

Cloud • eCommerce • Food • Mobile • App development

Design and maintain automated test scripts for mobile applications. Collaborate with teams, execute tests, analyze results, and improve frameworks.

Top Skills: AppiumEspressoGitJavaJenkinsMongoDBRestSelenideSeleniumXctest

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
Key Industries: Artificial intelligence, adtech, media, software, game development
Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering