Radicle Health

Senior Cloud Site Reliability Engineer

Posted 11 Days Ago

Remote

Hiring Remotely in USA

110K-140K

Senior level

Remote

Hiring Remotely in USA

110K-140K

Senior level

As a Senior Cloud Site Reliability Engineer, you'll advance cloud infrastructure via IaC, enhance CI/CD pipelines, ensure reliability, and collaborate with engineering teams to improve overall platform performance and observability.

The summary above was generated by AI

Radicle Health is a collection of human services software products designed to foster collaboration and innovation, helping organizations better serve their communities. We believe technology plays a crucial role in the success of the human services sector, but no single system can meet the diverse needs of every agency. That’s why we’ve built Radicle Health as a home for mission-driven products that support organizations in delivering essential services. Under one roof, our teams learn from each other, test ideas faster, and think holistically about the individuals and communities we serve.

About Radicle Health:

Radicle Health is a collection of human services software products created and designed to foster collaboration and experimentation across our teams so that we can collectively better serve our communities. We believe technology is at the root of success in the human services sector, but that no single system can meet the needs of every agency. So we’ve built Radicle Health around this guiding principle. Our products are 100% committed to their customers and the individuals their customers serve. But under one roof, our teams can learn from each other, can more quickly test ideas, and can think holistically about our communities and the people at the center of those communities. Learn more at www.radicle-health.com

About the Job:

Join the Radicle Health shared SRE team to help build and evolve a unified platform supporting the SaraWorks, Link2Feed, and Foothold Care Management applications. Our environment spans AWS, Azure, and GCP; this role will focus primarily on AWS (commercial and some GovCloud) and GCP initially, while contributing patterns and tooling usable across all clouds. You'll partner closely with individual product pods while shaping shared standards, automation, and platform capabilities.

Who you are:

5+ years in SRE / DeOps / Platform / Infrastructure Engineering.
Eligibility for (or prior possession of) a PIV credential (Tier 1 background investigation).
Strong AWS foundations (networking, IAM, compute, storage, managed databases, VPC design); exposure to multi-cloud concepts.
Linux systems administration and production troubleshooting proficiency.
Production container experience (Docker plus ECS, Fargate, EKS, or Kubernetes).
Hands-on IaC (Terraform, Pulumi, or CloudFormation) with willingness to adopt Pulumi.
Scripting or programming in at least one of: Python, Bash, TypeScript, Go, Ruby, or similar.
CI/CD pipeline design and maintenance (GitLab CI or equivalent).
Practical observability (metrics, logs, tracing, alert strategy design -- we're invested in DataDog here).
Incident response/on‑call participation with follow‑through on remediation.
Clear written and verbal communication; able to tailor depth to audience.
Availability during core US Eastern collaboration hours.

Preferred Experience:

Pulumi (via Cloud or Self-Hosted).
AWS GovCloud experience; familiarity with compliance frameworks (HIPAA, FedRAMP, SOC 2).
GCP services (GKE, Cloud SQL, IAM, networking) and foundational Azure awareness.
Advanced container orchestration (autoscaling strategies, service mesh, workload isolation).
Performance tuning & optimization for PostgreSQL or other relational databases.
Application ecosystem familiarity: Ruby and/or .NET.
Disaster recovery strategy, resilience / chaos engineering practice.
AI-assisted DevOps / AIOps tooling: e.g., GitHub Copilot, incident automation, AI-driven runbook generation, etc.
Experience applying LLMs or automation to infra workflows (e.g., generating IaC modules, intelligent alert tuning, predictive scaling).
Familiarity with AI transformation initiatives: governance, data sensitivity considerations, and secure integration of AI into engineering workflows.

What you’ll be responsible for:

1. Infrastructure as Code & Cloud Engineering

Design, build, and evolve AWS (and GCP initial scope) infrastructure using IaC (Pulumi preferred; Terraform/CloudFormation experience transferable).

2. Container & Runtime Platform

Advance containerization (ECS/Fargate, EKS/Kubernetes, or equivalent) and establish secure, observable runtime patterns.

3. CI/CD & Release Engineering

Enhance pipelines (GitLab CI or similar) for reliable builds, automated testing, artifact/version management, and progressive delivery.

4. Collaboration & Enablement

Partner with engineering pods on hands-on implementation, architecture, incident response readiness, and post‑incident improvement.

5. Observability & Operational Excellence

Implement actionable metrics, tracing, structured logging, and intelligent alerting; refine SLOs and reduce MTTR.

6. Reliability & Performance

Lead capacity planning, resilience reviews, failover / DR exercises, and performance tuning aligned to SLIs/SLOs.

7. Security & Compliance

Embed least‑privilege IAM, secrets management, hardened configurations, and support compliance needs (e.g., GovCloud, healthcare).

8. Automation & Tooling

Eliminate toil via scripting, reusable service templates, policy-as-code, and self‑service operational workflows.

9. Documentation & Runbooks

Maintain clear architecture diagrams, decision records, playbooks, and onboarding guides.

10. Incident & On‑Call

Participate in a humane rotation; drive blameless retros and ensure remediation actions are implemented.

What we offer:

Unlimited PTO policy

Competitive medical, dental, and vision healthcare coverage

401k matching

Paid holidays

Volunteer time off

Paid parental leave

Remote work stipend

Compensation: $110,000 - $140,000

Location: Remote

Salary ranges are dependent on a variety of factors, including qualifications, experience and geographic location. More information about the salary range specific to your working location and other factors will be shared during the hiring process.

Radicle Health is an Equal Employment Opportunity employer that proudly pursues and hires a diverse workforce. Radicle Health does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender-identity, sexual orientation, disability, age, military or veteran status, or any other basis protected by applicable local, state, or federal laws or prohibited by Company policy.

Top Skills

AWS

Azure

Bash

CloudFormation

Datadog

Docker

Ecs

Eks

Fargate

GCP

Gitlab Ci

Kubernetes

Pulumi

Python

Ruby

Terraform

Typescript

Similar Jobs

NVIDIA

Senior Site Reliability Engineer

6 Days Ago

Remote

CA, USA

208K-334K

Expert/Leader

208K-334K

Expert/Leader

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

As a Senior Site Reliability Engineer at NVIDIA, you'll manage large-scale Kubernetes clusters, enhance operational reliability, define SLOs, and troubleshoot complex system issues.

Top Skills: AnsibleAWSAzureChefDatadogElk StackGCPGoGrafanaKubernetesLightstepLinuxOciOpentelemetryPrometheusPuppetPythonSplunkTerraform

NVIDIA

Senior Site Reliability Engineer

16 Days Ago

In-Office or Remote

144K-270K

Senior level

144K-270K

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

The Senior Site Reliability Engineer (SRE) at NVIDIA is responsible for designing, building, and maintaining large-scale production systems, focusing on reliability and efficiency, automation, and continuous improvement.

Top Skills: ContainersGoKubernetesLinuxNetworkingOpenstackPerlPythonRuby

Trail of Bits

Security Engineer

An Hour Ago

Remote

United States

Junior

Cybersecurity

The Security Engineer I will assess blockchain applications for vulnerabilities, conduct code reviews, develop security tools, and work on smart contract security analysis.

Top Skills: Automated Security ToolsBlockchainC++EthereumEvmFuzzing TechniquesGoRustSmart ContractsSolidityStatic Analysis

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
Key Industries: Artificial intelligence, adtech, media, software, game development
Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering