Alembic Logo

Alembic

Senior Site Reliability Engineer

Posted 4 Days Ago
Be an Early Applicant
In-Office
San Francisco, CA
210K-240K Annually
Senior level
In-Office
San Francisco, CA
210K-240K Annually
Senior level
The Senior Site Reliability Engineer will design and maintain scalable infrastructure, improve system reliability, manage CI/CD pipelines, and collaborate across teams for operational excellence.
The summary above was generated by AI

About the Role

We’re looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability, and operational excellence at the core. You’ll partner with engineers and data scientists to build, automate, and maintain the infrastructure that powers our core platform—including data pipelines, ML workloads, and real-time analytics systems.

This is a hands-on, high-impact role with visibility across the stack and the opportunity to shape the future of our infrastructure and operations.

Key Responsibilities

  • Design, build, and maintain scalable infrastructure to support real-time analytics and machine learning workloads

  • Improve system reliability and performance through automation, observability, and proactive capacity planning

  • Own and evolve CI/CD pipelines, deployment automation, rollback mechanisms, and config management

  • Implement and maintain monitoring, alerting, and incident response processes (SLOs, runbooks, on-call rotations)

  • Collaborate across engineering and data science teams to drive a culture of performance and reliability

  • Ensure security, compliance, and operational readiness across our cloud infrastructure

  • Drive post-incident analysis and continuous improvement initiatives

Must-Have Qualifications

  • 8+ years of experience in SRE, DevOps, or infrastructure engineering roles

  • Deep experience with cloud environments (AWS preferred), containerization (Docker), and orchestration (Kubernetes)

  • Solid understanding of infrastructure-as-code (e.g., Terraform, Ansible)

  • Strong knowledge of Linux systems, networking, and systems performance tuning

  • Experience with monitoring and observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry)
    Proficiency with CI/CD tools and pipelines (e.g., GitHub Actions, ArgoCD, etc.)

  • Ability to debug complex systems and automate solutions in scripting languages (Python, Bash, etc.)

  • Excellent communication skills and the ability to work cross-functionally

Nice-to-Have

  • Experience supporting data-intensive platforms (Spark, Airflow, Kafka, etc.)

  • Familiarity with security practices for cloud-native applications and infrastructure

  • Experience in high-compliance or SOC-2 environments

What You’ll Get

  • Ownership of mission-critical infrastructure in a company solving real-world enterprise problems

  • A front-row seat to a high-performance engineering culture

  • The ability to influence how our platform scales—from deployment to incident management

  • An environment that values curiosity, accountability, and impact

Top Skills

Ansible
Argocd
AWS
Bash
Datadog
Docker
Elk
Github Actions
Grafana
Kubernetes
Linux
Opentelemetry
Prometheus
Python
Terraform

Similar Jobs

56 Minutes Ago
In-Office
San Francisco, CA, USA
160K-190K Annually
Senior level
160K-190K Annually
Senior level
Healthtech • Information Technology • Internet of Things
As a Senior Site Reliability Engineer, you will enhance system resilience, implement observability tools, optimize performance, and improve incident response across cloud-native platforms.
Top Skills: AWSAzureBashCi/CdEksFluxcdGCPGoKubernetesNewrelicPrometheusPython
2 Days Ago
In-Office
San Francisco, CA, USA
193K-232K Annually
Senior level
193K-232K Annually
Senior level
Edtech
The Senior Site Reliability Engineer will design automation tools, ensure uptime, manage Kubernetes clusters, optimize CI/CD systems, and collaborate on service resilience.
Top Skills: ArgocdCircleCIDatadogGithub ActionsGoIstioKubernetesLinuxPythonTerraform
3 Days Ago
In-Office
Santa Clara, CA, USA
120K-195K Annually
Mid level
120K-195K Annually
Mid level
Cybersecurity
The role involves optimizing cloud infrastructure, managing SLOs, and ensuring system reliability while collaborating with teams and automating processes.
Top Skills: AnsibleAWSAzureGCPGitlab CiGoKubernetesMongoDBMySQLNoSQLPostgresPythonRedisSQLTerraform

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account