Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Remote Site Reliability Engineer Jobs in Los Angeles, CA

Domino Data Lab

Staff Site Reliability Engineer

Reposted 2 Days AgoSaved

Easy Apply

Remote or Hybrid

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

MongoDB

Site Reliability Engineer (Senior or Staff), Atlas

Reposted 3 Days AgoSaved

Easy Apply

Remote or Hybrid

United States

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

Openly

DevOps/SRE II (Remote, US)

Reposted 4 Days AgoSaved

Remote

United States

115K-173K Annually

Junior

115K-173K Annually

Junior

Insurance

Build, automate, and maintain cloud infrastructure and CI/CD for Openly's insurance platform. Implement IaC, monitoring, and security best practices; lead incident response and postmortems; reduce operational toil through tooling and automation; influence architecture and deployment decisions.

Top Skills: AirflowAiven DebeziumArcgisBigQueryCircleCICloud FunctionsCloud RunCloudsqlComposerDatadogDonutFivetranGCPGcsGitGoJupyter NotebooksKafkaKubernetesNuxtPostgresPub/SubPythonRSlackSQLTailwindTerraformVuejsWebpackZoom

MongoDB

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Reposted 12 Days AgoSaved

Easy Apply

Remote or Hybrid

United States

Easy Apply

126K-248K Annually

Senior level

126K-248K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.

Top Skills: AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls

Cohere Health

Site Reliability Engineer ll

14 Days AgoSaved

Easy Apply

Remote

United States

Easy Apply

100K-110K Annually

Mid level

100K-110K Annually

Mid level

Healthtech • Software

Operate and maintain AWS-hosted MERN applications and large-scale data workflows. Manage serverless and Spark-based pipelines, perform incident response and on-call duties, engineer automation to eliminate operational toil, ensure HIPAA/SOC2/HITRUST compliance, build observability and lead blameless post-mortems.

Top Skills: Amazon EcsAmazon EksAmazon EmrAthenaAws GlueAws LambdaAws SnsAws SqsCloudwatchEc2IamJavaScriptMernMySQLNode.jsOpentofuPysparkPythonRabbitMQTerraformTypescriptVpc

Hadrian

Site Reliability Engineer, Client Platform

Reposted 20 Hours AgoSaved

In-Office or Remote

2 Locations

164K-270K Annually

Mid level

164K-270K Annually

Mid level

Aerospace • Hardware • Software • Defense • Manufacturing

Build scalable automated solutions for device fleet management, own and optimize MDM platforms, write OS-level scripts for self-healing, gather telemetry to prevent end-user disruption, translate compliance (CMMC) into code-managed baselines, and create dashboards and alerts measuring end-user SLOs.

Top Skills: AnsibleBashChefFleet DmIntuneJAMFOsqueryPowershellPulumiPuppetPythonSaltTerraformWorkspace One

Runpod

Site Reliability Engineer

21 Days AgoSaved

Remote

USA

150K-200K Annually

Senior level

150K-200K Annually

Senior level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

Ensure stability and resilience of Runpod's distributed AI platform by defining SLIs/SLOs, leading incident response, building observability and reliability tooling, automating operational workflows, and partnering with engineering teams to reduce toil and improve production readiness.

Top Skills: BashCi/CdContainerized Production SystemsGoGpu Observability ToolingGrafanaInfrastructure As CodeLinuxPrometheusPython

Zscaler

Site Reliability Engineer-SkillBridge Intern

Reposted 21 Days AgoSaved

Easy Apply

Remote or Hybrid

USA

Easy Apply

Internship

Cloud • Information Technology • Security • Software • Cybersecurity

This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.

Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform

GitLab

Site Reliability Engineer, Cloud Cost Utilization

Reposted 22 Days AgoSaved

Easy Apply

Remote

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.

Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform

Cooley

Senior Technology Site Reliability Engineer

Reposted 4 Days AgoSaved

In-Office or Remote

3 Locations

140K-205K Annually

Senior level

140K-205K Annually

Senior level

Information Technology • Legal Tech

The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.

Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Infrastructure Security

Reposted 24 Days AgoSaved

Easy Apply

Remote or Hybrid

United States

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.

Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform

Onebrief

Senior Site Reliability Engineer (Arlington, VA) - Secret Clearance Required - Relocation Provided

3 Days AgoSaved

Remote

United States

180K-220K Annually

Senior level

180K-220K Annually

Senior level

Software • Defense

Work as an SRE embedded with product teams to improve reliability by fixing application code (primarily TypeScript), building observability (Prometheus, Loki, Grafana, Alloy), defining SLIs/SLOs, leading incident response and postmortems, automating toil, and supporting deployments across on‑prem DoD and AWS environments.

Top Skills: AlloyAWSBashContainersDockerGithub ActionsGitlab Ci/CdGoGrafanaJenkinsKubectlKubernetesLokiNode.jsPrometheusPythonTypescript

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

Andromeda (andromeda.ai)

Forward Deployed Engineer - SRE

33 Minutes AgoSaved

In-Office or Remote

USA

Senior level

Artificial Intelligence • Cloud • Information Technology • Software

Embed with customer teams running large-scale GPU training and inference to onboard, tune, debug, and improve reliability. Diagnose fabric, driver, scheduler, and application failures; profile performance; build automation, monitoring, and preflight checks; lead incident response; and convert field learnings into product improvements and reusable reference configurations.

Top Skills: AnsibleBashCgroupsContainer RuntimesCudaDcgmDevice PluginsFabric ManagerGoGpfsHelmInfinibandKubernetesKv CacheLustreNamespacesNcclNvidia DriversNvidia-SmiNvlinkPythonRoceSlurmSshTerraformTopology-Aware SchedulingVastWeka

Outpost

Site Reliability Engineer

4 Hours AgoSaved

Remote

USA

Mid level

3PL: Third Party Logistics

Own and improve uptime for backend services, APIs, workers, and ML pipelines; build monitoring, alerting, auto-remediation, optimize GCP infrastructure and Postgres performance, run on-call rotation and blameless postmortems.

Top Skills: Apollo ServerBashCi/CdCloud RunCloud SqlDatadogDockerExpressGCPGcsGrafanaNext.JsNode.jsPostgresPrometheusPythonReactTerraformTypescriptZabbix

Branch

Senior Site Reliability Engineer (SRE)

4 Hours AgoSaved

Remote

175K-185K Annually

Senior level

175K-185K Annually

Senior level

Software

Lead improvements in reliability, performance, scalability, capacity, and observability through automation and tooling. Partner with developers to design infrastructure and monitoring, define SLIs/SLOs, conduct load and performance testing, participate in incident response and root cause analysis, and manage monitoring services for production systems.

Top Skills: DockerGoGradleJavaKubernetesOpentelemetrySpring BootTerraform

Veeam

Site Reliability Engineer - FedRAMP

Reposted 20 Hours AgoSaved

Remote

United States

152K-253K Annually

Mid level

152K-253K Annually

Mid level

Cloud • Security • Software • Cybersecurity

Join the GOV/Sovereign Cloud SRE team to maintain and improve reliability for the Veeam Data Cloud. Responsibilities include incident response, SLIs/SLOs, observability (monitoring, alerting, dashboards), runbooks and documentation, IaC and CI/CD work in compliance-restricted environments, and participation in on-call rotations. Collaborate with engineering, security, and compliance teams to implement high availability and automation.

Top Skills: ArgocdAzureAzure DevopsAzure GovernmentC#Elk StackGithub ActionsGitlab CiGoGrafanaJavaJavaScriptKubernetesOpentelemetryPrometheusPulumiTerraformTerragruntTypescript

PTC

Principal Software Engineer-SRE

Reposted 20 Hours AgoSaved

Remote

USA

131K-185K Annually

Senior level

131K-185K Annually

Senior level

Information Technology • Internet of Things • Software • Virtual Reality

Lead reliability, availability, and resiliency strategies for large-scale systems, drive operational excellence, and provide technical mentorship across engineering teams.

Top Skills: AWSCi/CdJavaMongoDBRabbitMQZookeeper

Vynca Inc

Site Reliability Engineer

YesterdaySaved

Remote

United States

140K-150K Annually

Mid level

140K-150K Annually

Mid level

Healthtech

Build, operate, and scale AWS cloud infrastructure and Kubernetes workloads using Terraform and Helm. Improve observability, define SLIs/SLOs, automate deployments and incident response, support on-call rotation, and implement security and compliance (HIPAA, SOC 2) best practices while partnering with product and engineering teams.

Top Skills: AWSCi/CdEvent SourcingHelmKubernetesLinuxMonitoring/Logging/TracingNetworkingTerraform

MongoDB

Senior Site Reliability Engineer, Fleet Management

Reposted 6 Days AgoSaved

Easy Apply

Remote or Hybrid

United States

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.

Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform

Sezzle

VP Engineering - Infrastructure & SRE

2 Days AgoSaved

Remote

United States

400K-600K Annually

Expert/Leader

400K-600K Annually

Expert/Leader

Payments

Lead and scale Sezzle's infrastructure, platform, and SRE organization. Own AWS, Kubernetes, Aurora RDS, reliability (SLOs/error budgets), disaster recovery, on-call/incident command, infrastructure-as-code, and AI-augmented SRE tooling. Drive compliance (PCI-DSS/SOC2), cost management, vendor strategy, and cross-functional communication with executives and auditors.

Top Skills: Aurora RdsAWSAws PrivatelinkBlue-Green DeploymentsCanary DeploymentsCi/Cd PipelinesFinopsGitGitlab Ci/CdGoGrafanaIamKubernetes (Eks)Llm/AiopsLokiMySQLObservabilityPostgresPrometheusPythonReactReact NativeSecrets ManagementService MeshTempoTerraformTransit GatewayTypescriptVpc

Flock

Site Reliability Engineer III

2 Days AgoSaved

Remote

USA

140K-165K Annually

Senior level

140K-165K Annually

Senior level

Hardware • Machine Learning • Security • Software

Design and maintain developer experience tooling and CI/CD pipelines, manage self-service deployment tools (secrets, rollbacks), partner with cloud teams to troubleshoot production systems, participate in on-call rotation, and write production-grade automation and tests in Go and TypeScript to improve developer velocity and reliability.

Top Skills: AWSGithub ActionsGoHelmKubernetesTerraformTypescript

Xpert Development LLC

Senior DevOps & Site Reliability Engineer

Reposted 2 Days AgoSaved

Remote

United States

165K-190K Annually

Senior level

165K-190K Annually

Senior level

Artificial Intelligence • Information Technology • Software • Automation

Own US PST coverage for releases and incidents as the first SRE; bridge infrastructure and code by working with Kubernetes, Terraform, and AWS and patching Elixir when needed; lead incident response and post-mortems; define SLOs and observability; author runbooks and support HIPAA-aligned compliance for a regulated medical-device platform.

Top Skills: AWSElixirKubernetesTerraform

Aalyria

Site Reliability Engineer

Reposted 2 Days AgoSaved

Remote

United States

115K-135K Annually

Mid level

115K-135K Annually

Mid level

Aerospace • Manufacturing

As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.

Top Skills: ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform

DFIN

Sr. Site Reliability Engineer

8 Days AgoSaved

Remote or Hybrid

United States

Senior level

Fintech • Software

Lead SRE efforts for DFIN SaaS: ensure availability, performance, scalability, and automation. Implement monitoring, CI/CD, IaC, container orchestration, AI-enhanced observability, incident response, RCA, and runbook automation while collaborating across engineering teams.

Top Skills: .NetAiopsAksAnsibleAppdynamicsAWSAzureAzure DevopsBashC#Ci/CdCloud Ai ServicesContainersCosmosDatadogDynatraceEksFirewallHarnessIdera Sql Diagnostic ManagerInfrastructure As Code (Iac)JavaJenkinsKubernetesLinuxLoad BalancingNew RelicPowershellPythonRedgate Sql MonitorSolarwinds Database Performance AnalyzerSQLTerraformWindows

Offchain Labs

Site Reliability Engineer

Reposted 3 Days AgoSaved

Remote

United States

Mid level

Blockchain • Software

Build, operate, and scale production Kubernetes infrastructure using GitOps and declarative IaC. Design CI/CD workflows, observability, and secure-by-default systems. Troubleshoot networking/storage, participate in on-call rotations, automate operational workflows, and drive postmortems and reliability improvements.

Top Skills: ArbitrumArgocdArgocd ApplicationsetsAWSAzureBashCloudwatchCodebuildGCPGithub ActionsGitopsGoGrafanaK9SKubernetesLinuxLokiMimirPrometheusPrysmPythonTerraformYamlZerodev