Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in Los Angeles, CA
Events
The Site Reliability Engineer II designs and maintains scalable systems, focusing on automation, monitoring, incident response, and collaboration with developers to enhance operational practices and efficiency.
Top Skills:
BashCloud Service OperationsContainersContinuous DeliveryContinuous IntegrationGoInfrastructure As CodeOrchestration PlatformsPython
Reposted An Hour AgoSaved
Other • Social Impact
As a Senior Site Reliability Engineer, you will manage and improve Wikimedia's infrastructure, handle operational tasks, automate processes, and provide mentorship while participating in a 24/7 on-call rotation.
Top Skills:
AnsibleBashDebianGoGrafanaHhvmKubernetesMemcachedPHPPrometheusPuppetPythonRedisRuby
Artificial Intelligence • Fintech • Software • Financial Services
The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.
Top Skills:
AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform
Cloud • Software
Design, implement, and support Kubernetes and compute platforms in a private cloud. Oversee architecture and standardization across hardware, OS, and cloud orchestration.
Top Skills:
AnsibleBashCi/CdHelmKubernetesLinuxOpenstackPythonTerraformUbuntu
Cloud • Information Technology
As a Sr. Site Reliability Engineer, you'll ensure service reliability, build automation, and collaborate on infrastructure improvements while mentoring others.
Top Skills:
AnsibleCatchpointDockerElkGoGrafanaHashicorp VaultJenkinsKubernetesLinuxPrometheusPythonTerraform
Cloud • Security • Software • Cybersecurity
Design and maintain reliable infrastructure solutions for a cloud data protection platform. Ensure application scalability and support through CI/CD and monitoring tools while collaborating in a global team.
Top Skills:
AppinsightsAws CloudformationAzure Api ManagementAzure Arm TemplatesAzure Cosmos DbAzure DevopsAzure Entra IdAzure FunctionsAzure MonitorAzure Storage ServicesBashBitbucketElastic StackGitGoMicrosoft TfsPowershellPythonServerless FrameworkTerraform
Software
As a Senior Site Reliability Engineer at Regrello, you'll shape the developer platform, collaborate with customers, and ensure the reliability and security of infrastructure and applications.
Top Skills:
AWSAzureCircleCIGCPGithub ActionsGitlab CiGoKubernetesTerraform
Aerospace • Other
The Sr. Site Reliability Engineer at SpaceX is responsible for enhancing distributed systems, managing large data clusters, and ensuring software reliability on the Starlink project, focusing on customer experience and operational efficiency.
Top Skills:
Apache KafkaC#FlinkGoHbaseHdfsIstioJavaKubernetesLinuxPythonScalaSpark
Artificial Intelligence • Machine Learning • Security • Software
The Senior Staff Site Reliability Engineer will be responsible for ensuring system reliability, debugging issues, mentoring the engineering team, and maintaining infrastructure and CI/CD pipelines.
Top Skills:
AWSDatadogDockerGithub ActionsGrafanaHelmKotlinKubernetesPostgresPrometheusPythonRustTerraformTerragruntTypescript
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills:
ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Automotive
Design and implement scalable cloud infrastructure, monitor performance, automate processes, ensure security and compliance, and lead a DevOps team.
Top Skills:
AWSBashCi/CdDockerElk StackGCPGrafanaKubernetesPrometheusPythonTerraform
Fintech
The Staff Site Reliability Engineer role involves leading architecture, automating GCP environment, defining SLIs and SLOs, mentoring teammates, and enhancing system reliability and performance.
Top Skills:
ArgocdDatadogGCPGoHelmJavaScriptKubernetesPythonTerraformTypescript
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Digital Media • Software • Sports
Seeking a Senior Site Reliability Engineer to enhance system reliability, performance, and scalability. Focus on automation, observability, and improving CI/CD practices while collaborating with engineering teams for better incident response and metrics improvement.
Top Skills:
AWSAzureC++Ci/CdDatadogDockerElkGCPGoGrafanaJavaKubernetesLinuxPrometheusPythonTerraform
Software • Financial Services
Ensure platform reliability, performance, and availability by implementing observability, automating infrastructure, participating in on-call rotations and post-mortems, partnering with Product and Engineering, designing scalable architectures, mentoring teammates, and integrating Dynatrace with Azure DevOps and Jira while supporting compliance (SOC/FedRAMP).
Top Skills:
.NetAksAlpineAnsibleAppinsightsArm TemplatesAWSAzure DevopsBashBicepC#ChefCloudFormationDatadogDebianDynatraceEksGCPGitGitGksGrafanaHelmJIRAKubernetesLog AnalyticsAzureNew RelicOnestream SoftwareOpenshiftPowershellPowershell DscPrometheusPuppetPythonRest ApisSQLTerraformUbuntu
Aerospace • Hardware • Software • Defense • Manufacturing
As a Site Reliability Engineer, you'll ensure robotics system reliability, build telemetry integration, and develop tools for diagnostics and automation, collaborating with engineering teams for enhanced production reliability.
Top Skills:
C++DatadogGoKubernetesOpentelemetryPrometheusPythonRos2TelegrafTypescript
Artificial Intelligence • Blockchain • Information Technology • Consulting
Lead design and build of production-grade Azure infrastructure using Terraform, ensuring scalable, secure, and repeatable deployments. Provide technical leadership, platform enhancements, observability and incident response improvements, and Tier 2 infrastructure support while collaborating with engineering, security, and product teams to meet enterprise readiness and feature parity goals.
Top Skills:
ArgoAzureGoGrafanaKubernetesPrometheusPythonSpaceliftTerraform
Cloud • Security • Software • Generative AI
Design, build, and automate large-scale multi-cloud infrastructure and internal SRE tools. Improve host lifecycle, observability, alerting, and reliability; operate containerized workloads; participate in on-call rotations, incident response, runbooks, postmortems, code reviews, and mentoring.
Top Skills:
AnsibleArgo CdArgo WorkflowsCueDockerElastic StackGoGraphiteInfluxKubernetesLinuxPrometheusPuppetTerraformUbuntuUbuntu Live Patch
Legal Tech • Software
As a Senior Site Reliability Engineer, you will lead reliability initiatives, design and maintain systems, enhance CI/CD pipelines, and mentor junior engineers while ensuring system availability and performance.
Top Skills:
AWSBashCloudwatchEc2EksIamKubernetesLambdaPowershellPythonS3
HR Tech • Software
Design, build, maintain, and operate Calendly's infrastructure platform with IaC and observability. Evaluate and deploy cloud-native tools, enable application teams on reliability practices, participate in on-call rotation, and mentor engineers while defining standards for incidents, capacity, and platform usage.
Top Skills:
APIsCloud NetworkingControllers And OperatorsDatadogDistributed SystemsGCPGoInfrastructure As CodeKubernetesLinuxPython
Real Estate • Financial Services • PropTech
Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.
Top Skills:
AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget
Cloud • Software • Database
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Top Skills:
AksAnsibleAWSAzureBashDockerEksGCPGitGithub ActionsGkeJavaKubernetesLinuxPostgresPrometheusPythonShellTerraform
Cloud • Security • Software • Cybersecurity
The Site Reliability Engineer II - Database ensures the integrity, security, and performance of MySQL databases while collaborating with development and operations teams to address database issues and improve reliability.
Top Skills:
MySQLSQL
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills:
DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
Cloud • Security • Software • Generative AI
The role involves designing, building, and automating network infrastructure for Elastic's global services, focusing on reliability and operational excellence while enhancing customer experience through proactive problem management.
Top Skills:
AnsibleBgpDnsDockerElastic StackGoKubernetesTerraform
Artificial Intelligence • Machine Learning • Software • Analytics
The role involves end-to-end ownership of AWS infrastructure, managing Kubernetes platforms, and ensuring system reliability through observability and automation. Responsibilities include incident response and maintaining CI/CD systems.
Top Skills:
ArgocdAWSDatadogGitGoKubernetesPythonTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Popular Job Searches
All Filters
Total selected ()
No Results
No Results
































