Get the job you really want.
Maximum of 25 job preferences reached.
Top Remote Site Reliability Engineer Jobs in Los Angeles, CA
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
Design, maintain, and secure cloud infrastructure and CI/CD pipelines; automate operations with Go/Python; manage Kubernetes and blockchain nodes; implement disaster recovery; use AI tools for monitoring, anomaly detection, and capacity planning; participate in on-call rotations; mentor team members to improve reliability and performance.
Top Skills:
Go,Python,Shell,Terraform,Crossplane,Aws Lambda,Kubernetes,Helm,Ethereum,Solana,Arbitrum,Base,Avalanche,Postgresql,Redis,Opensearch,Apache Airflow,Aws Dms,Snowflake,Github Copilot,Gemini,Chatgpt,Llms,Apm,Rum,Telemetry
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Lead SRE responsible for architecting and automating fault-tolerant, scalable infrastructure across cloud and on-prem, driving deployment, monitoring, and performance tuning while mentoring engineers to improve reliability and SLAs.
Top Skills:
Gcp,Aws,Vsphere,Nutanix,Kubernetes,Docker,Go,Python,Linux,Terraform,Ansible,Chef,C#,.Net,Java,Elixir,Ruby,Aws Greengrass,Gitops
Reposted YesterdaySaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
This role involves building and maintaining observability services, ensuring service reliability, and collaborating with other teams on best practices.
Top Skills:
AWSFluentbitGCPJaegerKubernetesAzureQuickwitSplunkVectorVictoriametrics
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves improving software reliability, automating processes, collaborating with teams on system optimization, and mentoring engineers to establish reliability as a core value.
Top Skills:
AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
Reposted 8 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Site Reliability Engineer will enhance CI/CD frameworks, automate cloud infrastructure, manage Kubernetes and AWS services, and ensure operational excellence.
Top Skills:
AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the reliability and scalability of the infrastructure, lead a team in operational execution, ensure best practices in SRE, and mentor senior engineers.
Top Skills:
Ci/CdDockerGitopsGoKubernetesLinuxPythonTerraform
Reposted 11 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Reposted 23 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Site Reliability Engineer, you will empower developers by optimizing MongoDB Atlas, ensuring seamless performance across multiple cloud platforms while fostering a supportive culture.
Top Skills:
AWSGCPAzureMongoDB
Artificial Intelligence • Computer Vision • Greentech • Machine Learning • Robotics • Industrial • Automation
The Site Reliability Engineer at AMP will support the technology infrastructure, focusing on ticket management and software observability while developing tools to enhance operational efficiency in waste sortation facilities.
Top Skills:
AnsibleDockerGrafanaJenkinsLinuxPrometheus
Fintech • Software
The Principal Site Reliability Engineer is responsible for maintaining cloud infrastructure, ensuring application performance, and implementing automated solutions in a SaaS environment, while collaborating with security and software engineering teams.
Top Skills:
.NetAnsibleAppdynamicsAWSAzureAzure DevopsC#DatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicTerraform
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills:
AnsibleAws EcsKubernetesLinuxPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Big Data • Cloud • Healthtech • Software • Big Data Analytics
The Senior Site Reliability Engineer will ensure the reliability and scalability of enterprise applications, lead incident management, develop automation tools, mentor team members, and collaborate with cross-functional teams.
Top Skills:
AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Reposted 21 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Engineer in the InfraSec team, you'll lead the design and deployment of security solutions for cloud platforms, automate monitoring, and manage security tooling while mentoring a small team of SREs.
Top Skills:
AnsibleAWSAzureCloudFormationGCPGoTerraform
Information Technology • Legal Tech
The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.
Top Skills:
AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform
Big Data • Healthtech • HR Tech • Machine Learning • Software • Telehealth • Big Data Analytics
The Staff Site Reliability Engineer will architect, operate, and improve the platform while ensuring security compliance and enhancing development processes.
Top Skills:
AWSElasticsearchIstioKubernetesNatsNode.jsPostgresPythonReactTerraformTypescript
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
The Unified Communication Engineer manages and improves telecom systems, provides technical support, and integrates new UC technologies while ensuring stability of voice networks.
Top Skills:
AWSCiscoMicrosoftUcs ServersVcenterVMwareVoipZoom
Security • Cybersecurity
Lead the design and implementation of observability, SLO/SLA frameworks, and AI-enabled infrastructure automation. Architect scalable AWS infrastructure, improve incident management and on-call practices, and drive organization-wide adoption of telemetry and reliability standards.
Top Skills:
Honeycomb,Grafana,Aws,Vercel,Supabase,Terraform,Pulumi,Ci/Cd,Observability,Telemetry,Infrastructure-As-Code,Cursor,Claude,Codex,Ai-Assisted Tooling
Healthtech
The Site Reliability Engineer will ensure system reliability, collaborate with support teams, automate processes, and handle incident responses, with a strong focus on customer engagement and communication.
Top Skills:
AnsibleAWSAzureBashCi/CdDockerGCPGoGrafanaKubernetesPython
AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.
Top Skills:
ArgocdCi/CdDockerGitopsGoGrafanaHoneycombJenkinsKubernetesOpentelemetryPrometheusPythonTerraform
Payments
As a Principal Site Reliability Engineer, you'll architect scalable infrastructure, drive reliability, mentor engineers, and lead AI enablement efforts, ensuring high-performance across systems.
Top Skills:
AWSCi/CdDatadogElasticsearchGoGrafanaKubernetesNew RelicPrometheusPythonRds (Mysql/Postgres)Sql-Based RdbmsTypescript
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills:
ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Software • Consulting
Lead production support for external web applications: manage incidents, perform root cause analysis, expand observability (Splunk/OpenTelemetry), build dashboards, collaborate with dev and platform teams, and participate in 24x7 on-call rotations to improve availability and reliability.
Top Skills:
Splunk,Opentelemetry,Appdynamics,Datadog,Aws,Kubernetes,Python,Servicenow,Mulesoft,Postman,Linux,Shell Scripting,Openshift,Azure,Gcp,Api Testing
Security • Software
Maintain, automate, and improve operational tools and customer deployment processes; monitor and ensure service SLOs, backup/restore, alerting, and incident response; drive GitOps/IaC practices, cost tracking, and automation of repetitive tasks while supporting outages and upgrades.
Top Skills:
Ansible,Terraform,Helm,Kubernetes,Aws,Gcp,Azure,Prometheus,Grafana,Bash,Python,Gitops
Cloud • Software • Database
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Top Skills:
Kubernetes,Gke,Eks,Aks,Java,Bash,Shell,Python,Terraform,Ansible,Docker,Prometheus,Git,Github Actions,Linux,Postgresql,Aws,Gcp,Azure
Top Los Angeles, CA Companies Hiring Remote Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results



.png)

.png)




.png)


















