Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Remote Site Reliability Engineer Jobs in Los Angeles, CA

Phantom (phantom.com)

Staff Software Engineer (SRE)

Reposted 4 Days AgoSaved

Remote

USA

200K-250K Annually

Senior level

200K-250K Annually

Senior level

Software • Cryptocurrency

Manage and scale Kubernetes clusters, automate infrastructure, optimize performance, maintain blockchain nodes, and improve system reliability while collaborating with product teams.

Top Skills: Aws (Ec2Aws EksDatadogDockerIam)KubernetesOpentelemetryPulumiRdsS3Terraform

Supabase

Site Reliability Engineer

5 Days AgoSaved

Remote

USA

Senior level

Database

Embed with service teams to define SLIs/SLOs and error budgets, run Operational Readiness Reviews, improve incident-to-improvement pipelines, advise on resilience and architecture, reduce operational toil through automation, and shape org-wide on-call practices and operational maturity.

Top Skills: AWSCdkGrafanaKubernetesOpentelemetryPostgresPulumiTerraformVictoriametrics

GE Vernova

SRE Platform Engineer

5 Days AgoSaved

Remote

USA

Senior level

Energy • Manufacturing • Solar • Renewable Energy

Operate and harden production EKS Kubernetes clusters across multiple AWS regions. Build IaC (Terraform, Ansible), implement policy-as-code, ensure security and compliance, manage observability (Prometheus/Grafana), perform L3 support and incident RCA, run platform-level testing and DR, automate toil, and partner with application teams for sizing and cost optimization to achieve high availability for critical cloud infrastructure.

Top Skills: AlbAnsibleArgocdAws Ec2Certificate ManagementDatadogDynatraceEksFluxGoGrafanaKubernetesMskPod PriorityPrometheusPythonRdsS3Service MeshSplunkTerraformVpc

HHAeXchange

SRE Technical Project Manager

Reposted 5 Days AgoSaved

Remote

United States

100K-110K Annually

Mid level

100K-110K Annually

Mid level

Healthtech • Software

The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.

Top Skills: Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty

SitusAMC

Site Reliability Engineer - AWS - Remote

Reposted 6 Days AgoSaved

Remote

USA

110K-140K Annually

Senior level

110K-140K Annually

Senior level

Real Estate • Financial Services • PropTech

Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.

Top Skills: AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget

Tradeweb

Senior Site Reliability Engineer (SRE)

7 Days AgoSaved

Remote

United States

190K-240K Annually

Senior level

190K-240K Annually

Senior level

eCommerce

Ensure reliability and availability of Tradeweb's global AWS platform through IaC automation, observability and SLO definition, incident triage and resolution, on-call duties, collaboration with development teams, and security-focused platform improvements.

Top Skills: ArgocdAWSAws LambdaEksGitsecopsInfrastructure As Code (Iac)Kubernetes (K8S)KustomizeLgtmLinux/UnixPulumiPythonSmsSns

Circle (circle.so)

Senior Site Reliability Engineer

14 Days AgoSaved

Easy Apply

Remote

United States

Easy Apply

130K-140K Annually

Senior level

130K-140K Annually

Senior level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.

Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis

SimSpace

Staff Site Reliability Engineer

Reposted 8 Days AgoSaved

Remote

U.S.

165K-230K Annually

Senior level

165K-230K Annually

Senior level

Information Technology • Security

The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.

Top Skills: ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython

Andromeda (andromeda.ai)

Staff SRE, AI Infrastructure

Reposted 8 Days AgoSaved

In-Office or Remote

USA

Senior level

Artificial Intelligence • Cloud • Information Technology • Software

As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.

Top Skills: AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform

Arista Networks

FedRAMP Site Reliability Engineer (FedSRE) - CloudVision

Reposted 8 Days AgoSaved

Remote

101K-161K Annually

Senior level

101K-161K Annually

Senior level

Cloud • Software • Analytics

Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.

Top Skills: AnsibleBashGCPGkeGoKubernetesPulumiPython

Coinbase

Senior Site Reliability Engineer, Workforce Identity

15 Days AgoSaved

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.

Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform

Coinbase

Senior Site Reliability Engineer, Core AI Infrastructure

15 Days AgoSaved

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.

Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

Cerebras Systems

Staff Site Reliability Engineer – Automation and Platform

Reposted 10 Days AgoSaved

In-Office or Remote

California, USA

Senior level

Artificial Intelligence

The Deployment Engineer will build and operate AI inference clusters, ensure scalable deployments, optimize allocation, and maintain infrastructure. Responsibilities include software updates, telemetry development, and collaborative improvements with teams.

Top Skills: DockerGrafanaInfluxdbK8SLinuxPrometheusPython

CentralSquare Technologies

Lead Site Reliability Engineer - Remote

Reposted 10 Days AgoSaved

Remote

United States

Senior level

Software

The role involves designing, building, and maintaining AWS infrastructure, implementing IaC, developing CI/CD pipelines, automating operations, and enhancing network and security practices.

Top Skills: AWSBashCi/CdCloudFormationDockerKubernetesPowershellPythonTerraform

Zocdoc

Senior Site Reliability Engineer

Reposted 16 Days AgoSaved

Easy Apply

Remote or Hybrid

USA

Easy Apply

180K-220K Annually

Senior level

180K-220K Annually

Senior level

Healthtech • Information Technology • Software • Telehealth

The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.

Top Skills: AWSDockerGCPKubernetes

CoverMyMeds

Sr. Database Site Reliability Engineer (DB SRE)

Reposted 11 Days AgoSaved

In-Office or Remote

USA

132K-221K Annually

Senior level

132K-221K Annually

Senior level

Healthtech • Information Technology • Software

The Sr. Database Site Reliability Engineer manages the reliability and performance of Azure PostgreSQL platforms, applying SRE principles for automation and observability. Responsibilities include incident response, backup strategies, and ensuring compliance with security standards.

Top Skills: ArgocdAzure PostgresqlCi/CdDatadogGitHelmKubernetesTerraform

Xpert Development LLC

Senior DevOps & Site Reliability Engineer

12 Days AgoSaved

Remote

United States

165K-190K Annually

Senior level

165K-190K Annually

Senior level

Artificial Intelligence • Information Technology • Software • Automation

Own US PST coverage for releases and incidents as the first SRE; bridge infrastructure and code by working with Kubernetes, Terraform, and AWS and patching Elixir when needed; lead incident response and post-mortems; define SLOs and observability; author runbooks and support HIPAA-aligned compliance for a regulated medical-device platform.

Top Skills: AWSElixirKubernetesTerraform

Cooley

Senior Technology Site Reliability Engineer

Reposted 21 Days AgoSaved

In-Office or Remote

3 Locations

140K-205K Annually

Senior level

140K-205K Annually

Senior level

Information Technology • Legal Tech

The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.

Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform

Nebius

Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

United States

100K-140K Annually

Mid level

100K-140K Annually

Mid level

Artificial Intelligence • Information Technology • Consulting

The Linux Systems Administrator will maintain and troubleshoot Linux systems, support network services, and work on systems integration while collaborating with infrastructure teams.

Top Skills: DhcpDnsLinuxNtpPython

Strike (simplistic.com)

Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

USA

Senior level

Information Technology • Cryptocurrency

The Site Reliability Engineer will lead technical initiatives, architect solutions, troubleshoot issues, mentor team members, and improve observability practices.

Top Skills: ArgocdBashElk StackGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform

Kraken Digital Asset Exchange

Site Reliability Engineer - AI Agents

13 Days AgoSaved

Remote

United States

96K-192K Annually

Senior level

96K-192K Annually

Senior level

Blockchain • Financial Services • Cryptocurrency • Web3

Design, build, and operate scalable, observable infrastructure for AI agent workflows. Build platform services, APIs, and SDKs; manage cloud, Kubernetes, and model-serving compute; implement IaC, CI/CD, monitoring, incident response, security controls, and runbooks; collaborate with AI and data teams to productionize agent prototypes.

Top Skills: AWSBashCi/CdDockerKubernetesPythonTerraform

PTC

Principal Software Engineer-SRE

Reposted 13 Days AgoSaved

Remote

USA

113K-175K Annually

Senior level

113K-175K Annually

Senior level

Information Technology • Internet of Things • Software • Virtual Reality

Lead reliability, availability, and resiliency strategies for large-scale systems, drive operational excellence, and provide technical mentorship across engineering teams.

Top Skills: AWSCi/CdJavaMongoDBRabbitMQZookeeper

Veeam

GOV Site Reliability Engineer

14 Days AgoSaved

Remote

United States

152K-253K Annually

Mid level

152K-253K Annually

Mid level

Cloud • Security • Software • Cybersecurity

Join the GOV/Sovereign Cloud SRE team to maintain and improve reliability for the Veeam Data Cloud. Responsibilities include incident response, SLIs/SLOs, observability (monitoring, alerting, dashboards), runbooks and documentation, IaC and CI/CD work in compliance-restricted environments, and participation in on-call rotations. Collaborate with engineering, security, and compliance teams to implement high availability and automation.

Top Skills: ArgocdAzureAzure DevopsAzure GovernmentC#Elk StackGithub ActionsGitlab CiGoGrafanaJavaJavaScriptKubernetesOpentelemetryPrometheusPulumiTerraformTerragruntTypescript

HiBob

Senior Site Reliability Engineer - Remote EST

Reposted 19 Days AgoSaved

Remote or Hybrid

United States

190K-235K Annually

Senior level

190K-235K Annually

Senior level

HR Tech • Information Technology • Professional Services • Sales • Software

Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.

Top Skills: Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython

Yahoo

Software Engineer , SRE Tooling & Reliability Platforms

Reposted 14 Days AgoSaved

Remote

United States of America

89K-184K Annually

Entry level

89K-184K Annually

Entry level

AdTech • Digital Media • Information Technology • Other

As a Software Engineer in the Tooling and Reliability Platforms team, you'll develop AI services, manage incident tools, and utilize Infrastructure as Code for high-availability systems. You'll focus on integrating AI workflows and improving operational resilience for Yahoo's brands.

Top Skills: AWSCloudFormationDockerGCPGoJavaKubernetesPythonTerraform