Top Remote Site Reliability Engineer Jobs in Los Angeles, CA

Reposted 4 Days AgoSaved
Remote
USA
200K-250K Annually
Senior level
200K-250K Annually
Senior level
Software • Cryptocurrency
Manage and scale Kubernetes clusters, automate infrastructure, optimize performance, maintain blockchain nodes, and improve system reliability while collaborating with product teams.
Top Skills: Aws (Ec2Aws EksDatadogDockerIam)KubernetesOpentelemetryPulumiRdsS3Terraform
5 Days AgoSaved
Remote
USA
Senior level
Senior level
Database
Embed with service teams to define SLIs/SLOs and error budgets, run Operational Readiness Reviews, improve incident-to-improvement pipelines, advise on resilience and architecture, reduce operational toil through automation, and shape org-wide on-call practices and operational maturity.
Top Skills: AWSCdkGrafanaKubernetesOpentelemetryPostgresPulumiTerraformVictoriametrics
5 Days AgoSaved
Remote
USA
Senior level
Senior level
Energy • Manufacturing • Solar • Renewable Energy
Operate and harden production EKS Kubernetes clusters across multiple AWS regions. Build IaC (Terraform, Ansible), implement policy-as-code, ensure security and compliance, manage observability (Prometheus/Grafana), perform L3 support and incident RCA, run platform-level testing and DR, automate toil, and partner with application teams for sizing and cost optimization to achieve high availability for critical cloud infrastructure.
Top Skills: AlbAnsibleArgocdAws Ec2Certificate ManagementDatadogDynatraceEksFluxGoGrafanaKubernetesMskPod PriorityPrometheusPythonRdsS3Service MeshSplunkTerraformVpc
Reposted 5 Days AgoSaved
Remote
United States
100K-110K Annually
Mid level
100K-110K Annually
Mid level
Healthtech • Software
The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.
Top Skills: Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty
Reposted 6 Days AgoSaved
Remote
USA
110K-140K Annually
Senior level
110K-140K Annually
Senior level
Real Estate • Financial Services • PropTech
Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.
Top Skills: AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget
7 Days AgoSaved
Remote
United States
190K-240K Annually
Senior level
190K-240K Annually
Senior level
eCommerce
Ensure reliability and availability of Tradeweb's global AWS platform through IaC automation, observability and SLO definition, incident triage and resolution, on-call duties, collaboration with development teams, and security-focused platform improvements.
Top Skills: ArgocdAWSAws LambdaEksGitsecopsInfrastructure As Code (Iac)Kubernetes (K8S)KustomizeLgtmLinux/UnixPulumiPythonSmsSns
14 Days AgoSaved
Easy Apply
Remote
United States
Easy Apply
130K-140K Annually
Senior level
130K-140K Annually
Senior level
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
Reposted 8 Days AgoSaved
Remote
U.S.
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Information Technology • Security
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Top Skills: ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython
Reposted 8 Days AgoSaved
In-Office or Remote
USA
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.
Top Skills: AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform
Reposted 8 Days AgoSaved
Remote
US
101K-161K Annually
Senior level
101K-161K Annually
Senior level
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills: AnsibleBashGCPGkeGoKubernetesPulumiPython
15 Days AgoSaved
Easy Apply
Remote
USA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
15 Days AgoSaved
Easy Apply
Remote
USA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 10 Days AgoSaved
In-Office or Remote
California, USA
Senior level
Senior level
Artificial Intelligence
The Deployment Engineer will build and operate AI inference clusters, ensure scalable deployments, optimize allocation, and maintain infrastructure. Responsibilities include software updates, telemetry development, and collaborative improvements with teams.
Top Skills: DockerGrafanaInfluxdbK8SLinuxPrometheusPython
Reposted 10 Days AgoSaved
Remote
United States
Senior level
Senior level
Software
The role involves designing, building, and maintaining AWS infrastructure, implementing IaC, developing CI/CD pipelines, automating operations, and enhancing network and security practices.
Top Skills: AWSBashCi/CdCloudFormationDockerKubernetesPowershellPythonTerraform
Reposted 16 Days AgoSaved
Easy Apply
Remote or Hybrid
USA
Easy Apply
180K-220K Annually
Senior level
180K-220K Annually
Senior level
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills: AWSDockerGCPKubernetes
Reposted 11 Days AgoSaved
In-Office or Remote
USA
132K-221K Annually
Senior level
132K-221K Annually
Senior level
Healthtech • Information Technology • Software
The Sr. Database Site Reliability Engineer manages the reliability and performance of Azure PostgreSQL platforms, applying SRE principles for automation and observability. Responsibilities include incident response, backup strategies, and ensuring compliance with security standards.
Top Skills: ArgocdAzure PostgresqlCi/CdDatadogGitHelmKubernetesTerraform
12 Days AgoSaved
Remote
United States
165K-190K Annually
Senior level
165K-190K Annually
Senior level
Artificial Intelligence • Information Technology • Software • Automation
Own US PST coverage for releases and incidents as the first SRE; bridge infrastructure and code by working with Kubernetes, Terraform, and AWS and patching Elixir when needed; lead incident response and post-mortems; define SLOs and observability; author runbooks and support HIPAA-aligned compliance for a regulated medical-device platform.
Top Skills: AWSElixirKubernetesTerraform
Reposted 21 Days AgoSaved
In-Office or Remote
3 Locations
140K-205K Annually
Senior level
140K-205K Annually
Senior level
Information Technology • Legal Tech
The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.
Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform
Reposted 12 Days AgoSaved
Remote
United States
100K-140K Annually
Mid level
100K-140K Annually
Mid level
Artificial Intelligence • Information Technology • Consulting
The Linux Systems Administrator will maintain and troubleshoot Linux systems, support network services, and work on systems integration while collaborating with infrastructure teams.
Top Skills: DhcpDnsLinuxNtpPython
Reposted 12 Days AgoSaved
Remote
USA
Senior level
Senior level
Information Technology • Cryptocurrency
The Site Reliability Engineer will lead technical initiatives, architect solutions, troubleshoot issues, mentor team members, and improve observability practices.
Top Skills: ArgocdBashElk StackGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform
13 Days AgoSaved
Remote
United States
96K-192K Annually
Senior level
96K-192K Annually
Senior level
Blockchain • Financial Services • Cryptocurrency • Web3
Design, build, and operate scalable, observable infrastructure for AI agent workflows. Build platform services, APIs, and SDKs; manage cloud, Kubernetes, and model-serving compute; implement IaC, CI/CD, monitoring, incident response, security controls, and runbooks; collaborate with AI and data teams to productionize agent prototypes.
Top Skills: AWSBashCi/CdDockerKubernetesPythonTerraform
Reposted 13 Days AgoSaved
Remote
USA
113K-175K Annually
Senior level
113K-175K Annually
Senior level
Information Technology • Internet of Things • Software • Virtual Reality
Lead reliability, availability, and resiliency strategies for large-scale systems, drive operational excellence, and provide technical mentorship across engineering teams.
Top Skills: AWSCi/CdJavaMongoDBRabbitMQZookeeper
14 Days AgoSaved
Remote
United States
152K-253K Annually
Mid level
152K-253K Annually
Mid level
Cloud • Security • Software • Cybersecurity
Join the GOV/Sovereign Cloud SRE team to maintain and improve reliability for the Veeam Data Cloud. Responsibilities include incident response, SLIs/SLOs, observability (monitoring, alerting, dashboards), runbooks and documentation, IaC and CI/CD work in compliance-restricted environments, and participation in on-call rotations. Collaborate with engineering, security, and compliance teams to implement high availability and automation.
Top Skills: ArgocdAzureAzure DevopsAzure GovernmentC#Elk StackGithub ActionsGitlab CiGoGrafanaJavaJavaScriptKubernetesOpentelemetryPrometheusPulumiTerraformTerragruntTypescript
Reposted 19 Days AgoSaved
Remote or Hybrid
United States
190K-235K Annually
Senior level
190K-235K Annually
Senior level
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills: Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Reposted 14 Days AgoSaved
Remote
United States of America
89K-184K Annually
Entry level
89K-184K Annually
Entry level
AdTech • Digital Media • Information Technology • Other
As a Software Engineer in the Tooling and Reliability Platforms team, you'll develop AI services, manage incident tools, and utilize Infrastructure as Code for high-availability systems. You'll focus on integrating AI workflows and improving operational resilience for Yahoo's brands.
Top Skills: AWSCloudFormationDockerGCPGoJavaKubernetesPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account