Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Reliability Engineer Jobs in Los Angeles, CA

The Walt Disney Company

Senior Network Systems Reliability Engineer

23 Days AgoSaved

In-Office

Los Angeles, CA

135K-190K Annually

Senior level

135K-190K Annually

Senior level

Digital Media • Gaming • News + Entertainment • Sports

Lead lifecycle support and reliability for global network infrastructure, manage asset/license inventory and renewals, drive vendor management, reduce technical debt through standardization and upgrades, support security/compliance and change management, deliver observability/data insights and dashboards, and coordinate cross-functional stakeholders to ensure operational readiness and SLA-driven ticket resolution.

Top Skills: DhcpDnsExcelFirewallsJira Service ManagementNetwork CircuitsPythonRoutersRoutingServicenowSQLSwitchesSwitchingVpnWi-FiWireless Access Points

OpsMill

Product Reliability Engineer | US

Reposted 13 Days AgoSaved

Remote

Los Angeles, CA

Mid level

Information Technology • Software • Database • Automation

Owner of on-prem reliability and escalations: reproduce and resolve L2/L3 issues across heterogeneous Kubernetes environments, build diagnostics and automation, improve CI and e2e test stability, establish performance baselines, harden install/upgrade flows, and write tooling in Python/Go/Rust to reduce repeat incidents.

Top Skills: BenchmarkingCiCi/CdContainersE2E TestingGoHealth ChecksHelmInstallersIntegration TestingKubernetesLoad GenerationLogsMetricsNetworkingObservabilityPackagingProfilingPythonRbacRustStorageSupport BundlesTraces

K2 Space Corporation

Senior Site Reliability Engineer

YesterdaySaved

In-Office

Los Angeles, CA

140K-180K Annually

Senior level

140K-180K Annually

Senior level

Defense • Manufacturing

Design, deploy, and maintain cloud and on-prem infrastructure with IaC; scale and operate mission-critical services; automate tooling and self-service platforms; collaborate with engineering teams to ensure high availability, reliability, security, and resiliency for vehicle and non-vehicle software systems.

Top Skills: AWSAzureContainer OrchestrationGCPInfrastructure-As-Code (Iac)KubernetesLinuxNetworking

SpaceX

RF Hardware Reliability Engineer (Starshield)

Reposted 23 Days AgoSaved

In-Office

Los Angeles, CA

110K-135K Annually

Junior

110K-135K Annually

Junior

Aerospace • Other

Responsible for ensuring hardware reliability by conducting root cause analysis, troubleshooting RF systems, and improving satellite product quality.

Top Skills: CC++PythonRf SystemsSQL

SpaceX

Software Engineer, Site Reliability Engineering (Application Software)

YesterdaySaved

In-Office

Los Angeles, CA

125K-195K Annually

Mid level

125K-195K Annually

Mid level

Aerospace • Other

Design, deploy, operate, and scale mission-critical application platforms. Build infrastructure as code, improve observability, participate in on-call rotation and incident response, collaborate with software teams, optimize performance, and support vehicle software engineers.

Top Skills: AnsibleBazelBuckC#C++ClickhouseDockerJavaScriptKubernetesKvmLinuxMakeMySQLPostgresPuppetPythonQemuTerraformVsphere

SpaceX

Site Reliability Engineer (Application Software)

YesterdaySaved

In-Office

Los Angeles, CA

125K-195K Annually

Mid level

125K-195K Annually

Mid level

Aerospace • Other

Build, operate, and scale mission-critical application infrastructure and tooling for vehicle and satellite software delivery. Manage infrastructure as code, improve observability, collaborate with engineers, participate in on-call rotation, perform incident response and postmortems, and provide end-user support to reduce build and test times.

Top Skills: AnsibleBazelBuckC#C++ClickhouseDockerJavaScriptKubernetesKvmLinuxMakeMySQLPostgresPuppetPythonQemuTerraformVsphere

SpaceX

Sr. Site Reliability Engineer (Application Software)

Reposted YesterdaySaved

In-Office

Los Angeles, CA

165K-230K Annually

Senior level

165K-230K Annually

Senior level

Aerospace • Other

Build, operate, and scale mission-critical application platforms to accelerate vehicle software delivery. Manage infrastructure as code, improve observability, collaborate with developers, run on-call rotations, conduct blameless postmortems, and reduce performance bottlenecks to support Falcon, Starship, Dragon, and Starlink software lifecycles.

Top Skills: AnsibleBazelBuckC#C++ClickhouseDockerJavaScriptKubernetesKvmLinuxMakeMySQLPostgresPuppetPythonQemuTerraformVsphere

K2 Space Corporation

Senior Hardware Reliability Engineer

25 Days AgoSaved

In-Office

Los Angeles, CA

140K-175K Annually

Senior level

140K-175K Annually

Senior level

Defense • Manufacturing

Lead development and execution of reliability strategies, models, and testing (HALT/HASS/ALT/TVAC) to meet vehicle and constellation-level availability targets. Integrate reliability analyses with design, manufacturing, and risk management; drive root-cause investigations and corrective actions; mentor junior engineers and scale reliability practices for high-volume satellite production.

Top Skills: AltC++Fault Tree Analysis (Fta)HaltHassMonte Carlo SimulationProbabilistic Risk Assessment (Pra)PythonReliability Block DiagramsTvacWeibull Analysis

Northrop Grumman

Reliability Systems Engineer - Level 2

2 Days AgoSaved

In-Office

Los Angeles, CA

92K-138K Annually

Junior

92K-138K Annually

Junior

Aerospace • Logistics • Security • Software • Cybersecurity

Perform reliability engineering tasks including reliability predictions, failure rate estimation, de-rating analysis, and FMEA facilitation. Create reliability block diagrams/models, support design reviews and failure investigations, allocate reliability requirements, identify critical items and mitigations, and work with engineering teams to develop technical solutions for satellite and systems reliability and availability.

Top Skills: FmeaExcelMicrosoft PowerpointMicrosoft VisioMicrosoft WordModeling/Simulation SoftwareProbability And Statistical ToolsReliability Block Diagram Modeling

Northrop Grumman

Reliability Systems Engineer - Level 4

2 Days AgoSaved

In-Office

Los Angeles, CA

142K-213K Annually

Senior level

142K-213K Annually

Senior level

Aerospace • Logistics • Security • Software • Cybersecurity

Lead and perform reliability analyses for space systems including reliability predictions, de-rating assessments, failure rate estimation, and FMEA facilitation. Create reliability block diagrams/models, support design reviews and failure investigations, work with engineering on mitigation strategies, and apply statistical/probabilistic methods and industry tools to improve system availability and reliability.

Top Skills: Electrical Schematics InterpretationFmeaExcelMicrosoft PowerpointMicrosoft VisioMicrosoft WordProbability And Statistical Analysis ToolsReliability Block Diagram ModelingReliability Modeling/Simulation Tools

Picogrid

Site Reliability Engineer

3 Days AgoSaved

In-Office

Los Angeles, CA

170K-195K Annually

Mid level

170K-195K Annually

Mid level

Information Technology

Owner of production reliability across cloud and edge: define and drive SLIs/SLOs, build observability (Grafana/Prometheus/Loki/OpenTelemetry), participate in on-call and incident response, encode reliability in infrastructure-as-code (Terraform/OpenTofu), manage Kubernetes clusters, AWS hardening, HA databases, and oversee IoT/edge device fleet operations.

Top Skills: AWSGitGrafanaIamIotKubernetesLokiNebulaNvidia JetsonOpentelemetryOpentofuPrometheusPyrraSlothStatefulsetsTailscaleTerraformWireguard

MongoDB

Site Reliability Engineer (Senior or Staff), Atlas

Reposted 21 Days AgoSaved

Easy Apply

Remote or Hybrid

Los Angeles, CA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

Northrop Grumman

Reliability Systems Engineer - Level 3

Reposted 3 Days AgoSaved

In-Office

Los Angeles, CA

114K-171K Annually

Senior level

114K-171K Annually

Senior level

Aerospace • Logistics • Security • Software • Cybersecurity

Lead availability, reliability, and timeliness engineering for space systems: perform reliability predictions, failure rate estimation, FMEA facilitation, availability modeling, RBD creation, support design reviews and failure investigations, and track technical performance metrics.

Top Skills: ExcelMicrosoft WordPowerPointVisio

Northrop Grumman

Reliability Systems Engineer - Level 3

Reposted 3 Days AgoSaved

In-Office

Los Angeles, CA

114K-171K Annually

Senior level

114K-171K Annually

Senior level

Aerospace • Logistics • Security • Software • Cybersecurity

Lead reliability support for design teams including reliability predictions/assessments, de-rating analysis, failure rate estimation, and FMEA facilitation. Create reliability block diagrams, interpret electrical schematics, develop mitigations, support design reviews and failure investigations, and apply statistical/probabilistic methods using reliability modeling tools.

Top Skills: ExcelMicrosoft WordPowerPointVisio

Varda Space Industries

Senior Site Reliability Engineer

Reposted 3 Days AgoSaved

In-Office

Los Angeles, CA

153K-185K Annually

Senior level

153K-185K Annually

Senior level

Aerospace • Hardware • Software • Biotech • Pharmaceutical • Manufacturing

Lead design, build, and operate mission-critical infrastructure across cloud, on-prem, and spacecraft contexts. Implement IaC, CI/CD, observability, and scalable Kubernetes-based systems; respond to incidents, perform root cause analysis, optimize performance, and collaborate with software and hardware teams. Participate in on-call rotations and occasional travel.

Top Skills: AnsibleArgocdAzureBashCi/CdContainerdDatabasesDockerFirewallsGitopsGpu WorkloadsGrafanaHpcInfluxdbKubernetesLinuxPowershellPrometheusPythonSaltSlurmSubnetsTerraformVpcVpns

Credit Acceptance Corporation

Senior Database Reliability Engineer

17 Days AgoSaved

Remote

Los Angeles, CA

104K-153K Annually

Senior level

104K-153K Annually

Senior level

Financial Services

Lead design, automation, and operation of reliable, scalable database platforms. Implement IaC, CI/CD integration, observability, replication/migration strategies, security controls, performance tuning, HA/DR architectures, and developer self-service. Drive incident response, capacity planning, and automation to reduce toil.

Top Skills: AnsibleAurora PostgresqlAws DmsAws RdsCyberarkDatadogDynamoDBFlywayGithub ActionsGrafanaHibernateJdbcJenkinsJpaLiquibaseMongoDBMySQLOpentelemetryOraclePrometheusPythonRest ApisShell ScriptingSQL ServerSqlalchemyTerraform

HiveWatch

Senior Site Reliability Engineer

Reposted 5 Days AgoSaved

In-Office

Los Angeles, CA

183K-235K Annually

Senior level

183K-235K Annually

Senior level

Artificial Intelligence • Machine Learning • Security • Software

The Senior Staff Site Reliability Engineer will be responsible for ensuring system reliability, debugging issues, mentoring the engineering team, and maintaining infrastructure and CI/CD pipelines.

Top Skills: AWSDatadogDockerGithub ActionsGrafanaHelmKotlinKubernetesPostgresPrometheusPythonRustTerraformTerragruntTypescript

SpaceX

Sr. Site Reliability Engineer (Starlink)

Reposted 5 Days AgoSaved

In-Office

Los Angeles, CA

160K-220K Annually

Senior level

160K-220K Annually

Senior level

Aerospace • Other

The Sr. Site Reliability Engineer at SpaceX is responsible for enhancing distributed systems, managing large data clusters, and ensuring software reliability on the Starlink project, focusing on customer experience and operational efficiency.

Top Skills: Apache KafkaC#FlinkGoHbaseHdfsIstioJavaKubernetesLinuxPythonScalaSpark

Nokia

Senior Reliability Engineer

Reposted 19 Days AgoSaved

Remote or Hybrid

Los Angeles, CA

Senior level

Software

Lead reliability activities for photonic integrated circuits (PICs): evaluate failure modes, coordinate accelerated stress tests, develop life models from aging-data, and drive failure mode analyses across design, development, and production teams.

AXS

Site Reliability Engineer II

Reposted 6 Days AgoSaved

In-Office

Los Angeles, CA

130K-145K Annually

Mid level

130K-145K Annually

Mid level

Events

The Site Reliability Engineer II designs and maintains scalable systems, focusing on automation, monitoring, incident response, and collaboration with developers to enhance operational practices and efficiency.

Top Skills: BashCloud Service OperationsContainersContinuous DeliveryContinuous IntegrationGoInfrastructure As CodeOrchestration PlatformsPython

MongoDB

Senior Site Reliability Engineer, Fleet Management

Reposted 25 Days AgoSaved

Easy Apply

Remote or Hybrid

Los Angeles, CA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.

Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform

Hadrian

Site Reliability Engineer, Robotics

Reposted 7 Days AgoSaved

In-Office or Remote

Los Angeles, CA

164K-270K Annually

Mid level

164K-270K Annually

Mid level

Aerospace • Hardware • Software • Defense • Manufacturing

As a Site Reliability Engineer, you'll ensure robotics system reliability, build telemetry integration, and develop tools for diagnostics and automation, collaborating with engineering teams for enhanced production reliability.

Top Skills: C++DatadogGoKubernetesOpentelemetryPrometheusPythonRos2TelegrafTypescript

ServiceTitan

Senior Site Reliability Engineer

8 Days AgoSaved

Hybrid

Los Angeles, CA

138K-221K Annually

Senior level

138K-221K Annually

Senior level

Artificial Intelligence • Cloud • Fintech • Machine Learning • Mobile • Software

Lead design, development, deployment, and scaling of cloud infrastructure and SRE tooling. Build automation, CI/CD, observability, capacity planning, and reliability improvements; collaborate with product teams to define non-functional requirements and resolve production issues.

Top Skills: .NetApi GatewayAWSAzureC#Data LakehouseDatabricks DeltaDatadogElasticsearchElkEvent HubsFunctions/ServerlessGitGrafanaJavaJenkinsKafkaKibanaKubernetesLogstashPowershellSnowflakeSqsTeamcityVisual Basic

Green Dot Corporation

Lead Site Reliability Engineer

9 Days AgoSaved

In-Office

Los Angeles, CA

140K-199K Annually

Senior level

140K-199K Annually

Senior level

Fintech • Financial Services

Lead Site Reliability Engineer responsible for ensuring system reliability, scalability, and performance. Develop automated deployment strategies, maintain monitoring/observability, define SLIs/SLOs, collaborate with cross-functional teams, drive reliability best practices, participate in on-call incident response, and improve delivery through automation and training.

Top Skills: AWSAzureBashGCPPowershellPython

SpaceX

Site Reliability Engineer (Raptor)

9 Days AgoSaved

In-Office

Los Angeles, CA

125K-175K Annually

Junior

125K-175K Annually

Junior

Aerospace • Other

Manage and design server, HPC, storage, and networking infrastructure (including InfiniBand) to support propulsion engineering workflows. Integrate and optimize engineering applications (ANSYS, StarCCM+), automate deployments, troubleshoot performance bottlenecks, and coordinate with IT, facilities, and engineering teams to scale compute resources for rocket engine development.

Top Skills: AnsibleAnsysBashDockerEnterprise NetworkingHpcInfinibandKubernetesLinuxPuppetPythonStarccm+VirtualizationWindows Server