athenahealth

Lead Site Reliability Engineer

Posted 7 Days Ago

Remote

Hiring Remotely in USA

119K-203K Annually

Senior level

Remote

Hiring Remotely in USA

119K-203K Annually

Senior level

Lead Site Reliability Engineer responsible for ensuring cloud services reliability, automation, and performance while mentoring a team and collaborating cross-functionally. Drive initiatives to enhance incident management and enforce security compliance.

The summary above was generated by AI

Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.

Lead Site Reliability Engineer — Cloud Engineering

Position Summary:
Design, build, and operate highly available cloud infrastructure that supports critical healthcare services and scales to meet growth. Lead reliability, automation, monitoring, and incident response efforts to ensure system performance and resilience across private and public cloud environments. This role works in a hybrid environment based in Boston, MA and partners closely with development, security, and operations teams. This role reports to the Cloud Engineering Manager.

About the Team:
The Cloud Engineering team ensures continuous availability and scalability of the systems that power athenahealth’s products, managing compute, storage, and network services across hybrid cloud environments. The team partners with application engineering, security, product, and site reliability teams to design resilient architectures, automate operations, and reduce manual work through Infrastructure as Code and observability tooling. Key technologies include Terraform, Kubernetes, public cloud services, and monitoring/observability platforms.

Essential Job Responsibilities:

Define, measure, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for cloud services and infrastructure components.
Lead improvements to system availability, fault tolerance, and disaster recovery capabilities.
Manage incident detection, conduct root cause analysis, and oversee timely resolution of production incidents.
Drive automation and Infrastructure as Code (IaC) initiatives using tools such as Terraform, CloudFormation, and Ansible to provision and manage cloud resources.
Design and maintain monitoring, logging, and alerting solutions to provide continuous visibility into infrastructure health and performance.
Identify performance bottlenecks and implement capacity, cost, and performance optimizations for cloud services.
Ensure cloud infrastructure meets security and compliance requirements in collaboration with security and risk teams.
Lead and mentor Site Reliability Engineers, setting technical direction and promoting operational best practices.
Collaborate with development, DevOps, and operations teams to align infrastructure with application and business needs.
Evaluate and pilot AI-assisted tools that help detect anomalies, prioritize incidents, automate routine remediation, and forecast capacity needs; recommend safe, human-centered adoption practices and guide the team in using these tools responsibly.

Additional Job Responsibilities:

Own post-incident reviews and implement preventive measures to reduce recurrence and recovery time.
Support onboarding and go-live activities, including runbooks, playbooks, and run-time documentation.
Contribute to technical documentation and knowledge sharing to improve team effectiveness.
Participate in cross-team forums to align priorities and remove delivery blockers.
Support continuous improvement initiatives focused on reducing operational toil and improving system reliability.

Expected Education & Experience:

7+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or DevOps roles, with at least 3 years in a technical lead capacity.
10+ years of hands-on experience with cloud automation and configuration management tools (for example, Terraform, CloudFormation, Ansible, or Puppet) across hybrid cloud environments.
Strong practical experience with public cloud services (AWS, Google Cloud, Azure) and cloud-native technologies such as Kubernetes and container orchestration.
Proficiency in one or more scripting or programming languages (Python, Go, Bash, or similar).
Experience designing and operating monitoring, logging, and observability solutions (Prometheus, Grafana, Datadog, ELK, CloudWatch, or similar).
Demonstrated ability to build and operate highly available, scalable, and fault-tolerant systems in production.
Solid knowledge of networking, storage, compute, Linux administration, and cloud security best practices.
Experience with CI/CD pipelines and automating deployment and release processes.
Experience mentoring engineers and providing technical leadership across distributed teams.
Bachelor’s degree in Computer Science, Engineering, or related field preferred; equivalent practical experience accepted.
Certifications in cloud platforms (AWS, GCP, Azure) or relevant technologies are a plus.

Expected Compensation

$119,000 - $203,000

The base salary range shown reflects the full range for this role from minimum to maximum. At athenahealth, base pay depends on multiple factors, including job-related experience, relevant knowledge and skills, how your qualifications compare to others in similar roles, and geographical market rates. Base pay is only one part of our competitive Total Rewards package - depending on role eligibility, we offer both short and long-term incentives by way of an annual discretionary bonus plan, variable compensation plan, and equity plans.

About athenahealth

Our vision: In an industry that becomes more complex by the day, we stand for simplicity. We offer IT solutions and expert services that eliminate the daily hurdles preventing healthcare providers from focusing entirely on their patients — powered by our vision to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.

Our company culture: Our talented  employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our vision. We are a diverse group of dreamers and do-ers with unique knowledge, expertise, backgrounds, and perspectives. We unite as mission-driven problem-solvers with a deep desire to achieve our vision and make our time here count. Our award-winning culture is built around shared values of inclusiveness, accountability, and support.

Our DEI commitment: Our vision of accessible, high-quality, and sustainable healthcare for all requires addressing the inequities that stand in the way. That's one reason we prioritize diversity, equity, and inclusion in every aspect of our business, from attracting and sustaining a diverse workforce to maintaining an inclusive environment for athenistas, our partners, customers and the communities where we work and serve.

What we can do for you:

Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative  workspaces  — some offices even welcome dogs.

We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment, full-time. With consistent communication and digital collaboration tools, athenahealth enables employees to find a balance that feels fulfilling and productive for each individual situation.

In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. We provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued.

Learn more about our culture and benefits here: athenahealth.com/careers

https://www.athenahealth.com/careers/equal-opportunity

Top Skills

Ansible

AWS

Aws Cloudformation

Azure

Bash

Datadog

Docker

Elk Stack

GCP

Grafana

Kubernetes

Prometheus

Puppet

Python

Terraform

Similar Jobs

Ericsson

Head of BOS Integrated Services Hub 1

7 Days Ago

In-Office or Remote

Expert/Leader

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)

Lead the BOS Integrated Services Hub, focusing on pre-sales, service delivery, and team management in a high-pressure telecom environment.

Top Skills: Ai/MlApi ManagementBssCloud-NativeCobitData AnalyticsItilMicroservice ArchitectureOssSafe

Ericsson

EMEA Market Area Compliance Officer

8 Days Ago

In-Office or Remote

5-10 Annually

Senior level

5-10 Annually

Senior level

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)

The Compliance Officer will promote compliance culture, advise on ethical practices, manage compliance programs, assess risks, and ensure effective communication across departments.

Top Skills: AmlAntitrustCompliance FrameworksData PrivacyGovernanceRegulatory RequirementsTrade Compliance

Samsara

Regional Sales Director, Enterprise Select-North East US

8 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

420K-420K Annually

Senior level

420K-420K Annually

Senior level

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software

The Sales Director will manage an Enterprise Sales team, develop metrics, coach Account Executives, and foster inclusion within a high-performing team.

Top Skills: SFDC

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
Key Industries: Artificial intelligence, adtech, media, software, game development
Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering