Design and operate observability platforms for metrics, logs, and alerts. Collaborate on infrastructure projects and enhance operational transparency.
Voltage Park is seeking an Infrastructure Engineer with a focus on Observability to join our Infrastructure Engineering team. Our engineers design and operate the systems that manage thousands of bare-metal servers, GPUs, and high-performance networks across multiple data centers.
This role combines the breadth of a core infrastructure engineer with a specialty in observability and telemetry. You'll design and operate metrics, logs, traces, and alerting pipelines that provide actionable insights for both internal teams and external customers - helping to ensure reliability and transparency at scale.
This is a fully remote position, although candidates must be based in the continental United States. Unfortunately, we are unable to provide sponsorship for this role.
RESPONSIBILITIES
- Design, build, and maintain observability platforms spanning metrics, logs, traces, and events.
- Create dashboards and alerting for internal stakeholders (InfraOps, Engineering, Customer Success) and scoped visibility for external customers.
- Ingest and correlate telemetry from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish.
- Implement noise-resistant alerting pipelines that improve detection and reduce operational load.
- Collaborate with infrastructure, platform, and customer-facing teams to embed observability into workflows.
- Contribute to broader infrastructure engineering projects beyond observability.
QUALIFICATIONS
- 8+ years in infrastructure engineering, SRE, or observability roles.
Strong experience with monitoring systems (Prometheus, Grafana, ELK, VictoriaMetrics, or similar).
- Proficiency in Python, Go, or bash for automation and data integration.
- Familiarity with container/Kubernetes observability.
- Understanding of streaming telemetry pipelines (Kafka, OTEL, Promtail, or equivalent).
- Strong written and verbal communication skills.
IDEAL EXPERIENCES
- Experience with GPU observability, particularly NVIDIA DCGM.
- Designing multi-tenant observability solutions with RBAC and scoped queries.
- Prior work with correlation engines for RCA, forecasting, or predictive alerting.
- Broader exposure to infrastructure domains (networking, storage, provisioning).
CULTURE
- You enjoy working with a small, highly motivated team.
- You're comfortable balancing autonomy with company-wide priorities.
- You value clarity, documentation, and actionable insights in observability systems.
You're excited to specialize in observability while contributing as a core infrastructure engineer.
Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law.
Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. If you require an accommodation during the job application process, please notify your recruiter.
This role combines the breadth of a core infrastructure engineer with a specialty in observability and telemetry. You'll design and operate metrics, logs, traces, and alerting pipelines that provide actionable insights for both internal teams and external customers - helping to ensure reliability and transparency at scale.
This is a fully remote position, although candidates must be based in the continental United States. Unfortunately, we are unable to provide sponsorship for this role.
RESPONSIBILITIES
- Design, build, and maintain observability platforms spanning metrics, logs, traces, and events.
- Create dashboards and alerting for internal stakeholders (InfraOps, Engineering, Customer Success) and scoped visibility for external customers.
- Ingest and correlate telemetry from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish.
- Implement noise-resistant alerting pipelines that improve detection and reduce operational load.
- Collaborate with infrastructure, platform, and customer-facing teams to embed observability into workflows.
- Contribute to broader infrastructure engineering projects beyond observability.
QUALIFICATIONS
- 8+ years in infrastructure engineering, SRE, or observability roles.
Strong experience with monitoring systems (Prometheus, Grafana, ELK, VictoriaMetrics, or similar).
- Proficiency in Python, Go, or bash for automation and data integration.
- Familiarity with container/Kubernetes observability.
- Understanding of streaming telemetry pipelines (Kafka, OTEL, Promtail, or equivalent).
- Strong written and verbal communication skills.
IDEAL EXPERIENCES
- Experience with GPU observability, particularly NVIDIA DCGM.
- Designing multi-tenant observability solutions with RBAC and scoped queries.
- Prior work with correlation engines for RCA, forecasting, or predictive alerting.
- Broader exposure to infrastructure domains (networking, storage, provisioning).
CULTURE
- You enjoy working with a small, highly motivated team.
- You're comfortable balancing autonomy with company-wide priorities.
- You value clarity, documentation, and actionable insights in observability systems.
You're excited to specialize in observability while contributing as a core infrastructure engineer.
Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law.
Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. If you require an accommodation during the job application process, please notify your recruiter.
Top Skills
Bash
Elk
Go
Grafana
Kafka
Otel
Prometheus
Promtail
Python
Victoriametrics
Similar Jobs at Voltage Park
Artificial Intelligence • Cloud • Hardware • Machine Learning • Software • Infrastructure as a Service (IaaS)
The Infrastructure Operations Engineer at Voltage Park will design and implement infrastructure solutions, ensure system stability, support AI workloads, and collaborate with various teams.
Top Skills:
AnsibleAWSBashCephElk StackGoKubernetesLinuxNfsPrometheusPythonTerraform
Artificial Intelligence • Cloud • Hardware • Machine Learning • Software • Infrastructure as a Service (IaaS)
The Technical Account Manager will manage customer relationships, ensure satisfaction, and optimize use of GPU cloud infrastructure for various workflows.
Top Skills:
AICloud InfrastructureGpuMachine Learning
Artificial Intelligence • Cloud • Hardware • Machine Learning • Software • Infrastructure as a Service (IaaS)
The Product and Privacy Counsel will advise on legal and compliance issues for AI software and cloud services, manage IP, oversee data privacy, and collaborate with engineering and leadership on liability standards.
Top Skills:
AICloud ServicesData Security
What you need to know about the Los Angeles Tech Scene
Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.
Key Facts About Los Angeles Tech
- Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
- Key Industries: Artificial intelligence, adtech, media, software, game development
- Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
- Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

