Get the job you really want.
Maximum of 25 job preferences reached.
Top Remote Site Reliability Engineer Jobs in Los Angeles, CA
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills:
BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills:
Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
Artificial Intelligence • eCommerce • Retail
Lead the SRE and DevOps team, ensure infrastructure reliability, oversee cloud operations, drive automation, and collaborate cross-functionally.
Top Skills:
AzureBashCi/CdDatadogDockerElk StackGoGrafanaKubernetesPowershellPrometheusPythonTerraform
Aerospace • Big Data • Greentech • Hardware • Social Impact
Design, deploy, and operate compute services for on-premises and cloud satellite imaging platforms. Build reproducible, scalable, highly available deployments, troubleshoot distributed systems, optimize constrained environments, document and automate operations, and participate in on-call rotations to ensure reliability for customer-facing and air-gapped deployments.
Top Skills:
AlloyAnsibleBashCudaGitopsGrafanaHelmJIRAK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills:
AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Real Estate • Financial Services • PropTech
As a Site Reliability Engineer, you will support AWS Cloud products, optimize processes, enhance automation, and ensure system reliability and performance.
Top Skills:
ArgocdAWSAzure DevopsBashCi/CdCloudwatchDockerEksFluxcdGitKubernetesPowershellPythonSQLTerraform
Cloud • Software
In this role, you'll support large-scale applications, improve observability, mentor team members, and ensure reliability by collaborating on deployments and writing automation scripts while providing 24/7 support.
Top Skills:
AnsibleAWSBashConfluenceDockerElk StackGCPGitlab CicdGrafanaJenkinsJIRAKubernetesLinuxMongoDBMySQLNagiosOciPerlPostgresPrometheusPuppetPythonTerraform
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills:
AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Computer Vision • Information Technology • Machine Learning • Natural Language Processing • Real Estate • Software
The SRE will maintain infrastructure for SaaS products on AWS, support developers, manage platform components, and handle IT tasks.
Top Skills:
AWSComputer VisionIacLarge Language ModelsNlpTerraform
Artificial Intelligence • Information Technology • Machine Learning • Software • Cybersecurity • Generative AI • Data Privacy
Lead global SRE and infrastructure teams to ensure reliability, scalability, and cost-efficiency of production and developer platforms. Define cloud and Kubernetes architecture, IaC, CI/CD, SLOs/SLIs, incident management, and cloud cost optimization while partnering with Security, Product, Finance, and Engineering.
Top Skills:
AIAutomationAWSCi/CdCloud-Native SystemsGCPInfrastructure As CodeKubernetesTerraform
Computer Vision • Machine Learning • Software
As a Site Reliability Engineer, ensure the reliability, performance, and scalability of Ditto's cloud infrastructure by developing observability solutions, leading incident management, and collaborating with product engineering teams.
Top Skills:
AWSAzureCDatadogGCPGoGrafanaHelmJavaKubernetesPrometheusRustTerraform
Artificial Intelligence • Healthtech • Software
The Staff Site Reliability Engineer will lead the reliability of production systems by defining SRE practices, improving observability, and ensuring fault-tolerance in cloud environments.
Top Skills:
AWSGoKubernetesPostgresPythonTerraformTypescript
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Digital Media • Social Media • Software • Sports
Lead the technical architecture and execution of migration to AWS, drive developer enablement, and automate infrastructure using code-first principles.
Top Skills:
Aws EksDatadogGithub ActionsGoIstioK6KubernetesNode.jsTerraform
Software
As a Site Reliability Engineer, you'll enhance system reliability, collaborate on production readiness, define SLIs/SLOs, and improve incident response.
Top Skills:
AWSDatadogGrafanaKubernetesOpentelemetryPrometheusTypescript
Cloud • Security • Software • Cybersecurity
The Senior Lead Site Reliability Engineer will ensure performance and uptime of security products, develop automation pipelines, and improve monitoring systems, working closely with various teams.
Top Skills:
AzureDatabricksDockerGoJenkinsKubernetesPythonTerraform
AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.
Top Skills:
ArgocdCi/CdDockerGitopsGoGrafanaHoneycombJenkinsKubernetesOpentelemetryPrometheusPythonTerraform
Edtech
The Lead Software Engineer will lead the SRE team, focusing on reliability, performance optimization, security, and mentoring developers, while improving overall platform resilience.
Top Skills:
ActivejobAnsibleAWSAws CloudwatchEc2EcsElasticsearchGitGCPGoogle Cloud StackdriverJenkinsJIRAKubernetesMemcachedMongoDBNew RelicNode.jsPostgresRedisRuby On RailsSidekiqSpinnakerTerraformTerragrunt
Information Technology • Legal Tech
The role involves maintaining and improving Azure infrastructure, managing Infrastructure as Code with Terraform, enhancing security measures, and operating CI/CD pipelines.
Top Skills:
AzureAzure DevopsBashCircleCIDatadogEfkElkGithub ActionsPowershellPythonTerraform
News + Entertainment
As an Ads Reliability Engineer, you will ensure the reliability of Netflix's Ad Suite by designing scalable infrastructure, collaborating with teams, and implementing automation for monitoring and incident response.
Top Skills:
AWSAzureGCPGoJavaKubernetesPythonTerraform
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Business Intelligence
The Senior Director of SRE leads and defines reliability and operational excellence across products, manages the SRE team, and scales reliability practices within the organization.
Top Skills:
AWSAzureCloud-Native NetworkingDistributed SystemsGCPKubernetesMicroservicesSite Reliability Engineering Principles
Big Data • Information Technology • Security • Software
The Senior Developer will drive observability roadmaps using SRE Golden Signals, establish monitoring strategies, enhance system reliability, and act as an expert in New Relic technology for performance management.
Top Skills:
BashCri-OCshKubernetesNew RelicPerlWindows Powershell
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves supporting network infrastructure, automating cloud infrastructure, managing CI/CD workflows, and ensuring operational excellence in IT support, including incident response and security practices.
Top Skills:
AnsibleAWSBashDockerGitKubernetesPythonRubyTerraform
Software • Cryptocurrency
Manage and scale Kubernetes clusters, automate infrastructure, optimize performance, maintain blockchain nodes, and improve system reliability while collaborating with product teams.
Top Skills:
Aws (Ec2Aws EksDatadogDockerIam)KubernetesOpentelemetryPulumiRdsS3Terraform
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
The Senior Site Reliability Engineer will manage system incidents, enhance monitoring and database infrastructure, and collaborate on scalable systems to maintain reliability as usage scales.
Top Skills:
AWSClickhouseKubernetesMySQLPostgresRedis
Blockchain • Web3
As a Site Reliability Engineer, you'll enhance observability, logging, and tracing, collaborating with engineers to optimize performance and security of infrastructure.
Top Skills:
AnsibleAWSAws CdkGCPGitGoGrafanaKubernetesLgtmLokiMimirOpentelemetryPrometheusRustSentryTempoTerraformTypescriptWebassembly
Popular Job Searches
All Filters
Total selected ()
No Results
No Results





















.png)














