Voltage Park is your enterprise AI factory. We offer scalable compute power, on-demand and reserved bare metal AI infrastructure using NVIDIA GPUs, with world-class service, performance and value. Founded with the mission of making accessible AI computing for all – our flexible, affordable GPU solutions power everyone from builders to enterprises.
We are seeking a highly skilled and proactive Infrastructure Operations Engineer to be part of our 24/7 Infrastructure Operations team responsible for the stability, scalability, and performance of compute, storage, and platform infrastructure. This role plays a key part in delivering always-on, high-performance environments that support AI/ML training, inference, and HPC workloads at scale. The ideal candidate combines technical depth with strong interpersonal skills and a passion for operational excellence.
This position offers full remote flexibility, although candidates must be based in the continental US and available to work during PST hours. Unfortunately, we are unable to provide sponsorship for this role.
Responsibilities
At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features.
Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases.
Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams.
Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position.
Qualifications
8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience.
5+ years experience with AWS.
2+ years experience with Kubernetes and strong container fundamentals.
2+ years experience with Terraform and Ansible
2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems.
Experience working in a Slack-first, asynchronous remote work environment.
Experience with monitoring systems (Prometheus, ELK stack).
Familiarity with the gitops workflow.
Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together.
Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband.
Experience building and delivering complex systems.
Effective at navigating tradeoffs between design, risk, cost, and outcomes.
Comfortable with navigating ambiguity.
Strong written and oral communication.
Ideal Experiences
Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware.
Experience with GPU servers, both in bare metal form or under virtualization.
Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors.
Experience with VAST storage systems
Culture
You enjoy working with a small group of friendly, highly motivated, execution focused colleagues.
You’re comfortable with a high degree of autonomy. We expect you to independently prioritize your work and understand how it maps to the overall needs and goals of the company.
You’re knowledgeable in your domain but also enjoy wearing multiple hats and venturing outside of your comfort zone when the need arises.
You value the ability to write well and understand the importance of good documentation.
Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. If you require an accommodation during the job application process, please notify your recruiter.
Compensation Range: $140K - $200K
#BI-Remote
Top Skills
Similar Jobs at Voltage Park
What you need to know about the Los Angeles Tech Scene
Key Facts About Los Angeles Tech
- Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
- Key Industries: Artificial intelligence, adtech, media, software, game development
- Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
- Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering