Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
Lead Site Reliability Engineer — Cloud Engineering
Position Summary:
Design, build, and operate highly available cloud infrastructure that supports critical healthcare services and scales to meet growth. Lead reliability, automation, monitoring, and incident response efforts to ensure system performance and resilience across private and public cloud environments. This role works in a hybrid environment based in Boston, MA and partners closely with development, security, and operations teams. This role reports to the Cloud Engineering Manager.
About the Team:
The Cloud Engineering team ensures continuous availability and scalability of the systems that power athenahealth’s products, managing compute, storage, and network services across hybrid cloud environments. The team partners with application engineering, security, product, and site reliability teams to design resilient architectures, automate operations, and reduce manual work through Infrastructure as Code and observability tooling. Key technologies include Terraform, Kubernetes, public cloud services, and monitoring/observability platforms.
Essential Job Responsibilities:
- Define, measure, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for cloud services and infrastructure components.
- Lead improvements to system availability, fault tolerance, and disaster recovery capabilities.
- Manage incident detection, conduct root cause analysis, and oversee timely resolution of production incidents.
- Drive automation and Infrastructure as Code (IaC) initiatives using tools such as Terraform, CloudFormation, and Ansible to provision and manage cloud resources.
- Design and maintain monitoring, logging, and alerting solutions to provide continuous visibility into infrastructure health and performance.
- Identify performance bottlenecks and implement capacity, cost, and performance optimizations for cloud services.
- Ensure cloud infrastructure meets security and compliance requirements in collaboration with security and risk teams.
- Lead and mentor Site Reliability Engineers, setting technical direction and promoting operational best practices.
- Collaborate with development, DevOps, and operations teams to align infrastructure with application and business needs.
- Evaluate and pilot AI-assisted tools that help detect anomalies, prioritize incidents, automate routine remediation, and forecast capacity needs; recommend safe, human-centered adoption practices and guide the team in using these tools responsibly.
Additional Job Responsibilities:
- Own post-incident reviews and implement preventive measures to reduce recurrence and recovery time.
- Support onboarding and go-live activities, including runbooks, playbooks, and run-time documentation.
- Contribute to technical documentation and knowledge sharing to improve team effectiveness.
- Participate in cross-team forums to align priorities and remove delivery blockers.
- Support continuous improvement initiatives focused on reducing operational toil and improving system reliability.
Expected Education & Experience:
- 7+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or DevOps roles, with at least 3 years in a technical lead capacity.
- 10+ years of hands-on experience with cloud automation and configuration management tools (for example, Terraform, CloudFormation, Ansible, or Puppet) across hybrid cloud environments.
- Strong practical experience with public cloud services (AWS, Google Cloud, Azure) and cloud-native technologies such as Kubernetes and container orchestration.
- Proficiency in one or more scripting or programming languages (Python, Go, Bash, or similar).
- Experience designing and operating monitoring, logging, and observability solutions (Prometheus, Grafana, Datadog, ELK, CloudWatch, or similar).
- Demonstrated ability to build and operate highly available, scalable, and fault-tolerant systems in production.
- Solid knowledge of networking, storage, compute, Linux administration, and cloud security best practices.
- Experience with CI/CD pipelines and automating deployment and release processes.
- Experience mentoring engineers and providing technical leadership across distributed teams.
- Bachelor’s degree in Computer Science, Engineering, or related field preferred; equivalent practical experience accepted.
- Certifications in cloud platforms (AWS, GCP, Azure) or relevant technologies are a plus.
Expected Compensation
$119,000 - $203,000The base salary range shown reflects the full range for this role from minimum to maximum. At athenahealth, base pay depends on multiple factors, including job-related experience, relevant knowledge and skills, how your qualifications compare to others in similar roles, and geographical market rates. Base pay is only one part of our competitive Total Rewards package - depending on role eligibility, we offer both short and long-term incentives by way of an annual discretionary bonus plan, variable compensation plan, and equity plans.
About athenahealth
Our vision: In an industry that becomes more complex by the day, we stand for simplicity. We offer IT solutions and expert services that eliminate the daily hurdles preventing healthcare providers from focusing entirely on their patients — powered by our vision to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
Our company culture: Our talented employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our vision. We are a diverse group of dreamers and do-ers with unique knowledge, expertise, backgrounds, and perspectives. We unite as mission-driven problem-solvers with a deep desire to achieve our vision and make our time here count. Our award-winning culture is built around shared values of inclusiveness, accountability, and support.
Our DEI commitment: Our vision of accessible, high-quality, and sustainable healthcare for all requires addressing the inequities that stand in the way. That's one reason we prioritize diversity, equity, and inclusion in every aspect of our business, from attracting and sustaining a diverse workforce to maintaining an inclusive environment for athenistas, our partners, customers and the communities where we work and serve.
What we can do for you:
Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative workspaces — some offices even welcome dogs.
We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment, full-time. With consistent communication and digital collaboration tools, athenahealth enables employees to find a balance that feels fulfilling and productive for each individual situation.
In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. We provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued.
Learn more about our culture and benefits here: athenahealth.com/careers
https://www.athenahealth.com/careers/equal-opportunity
Top Skills
Similar Jobs
What you need to know about the Los Angeles Tech Scene
Key Facts About Los Angeles Tech
- Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
- Key Industries: Artificial intelligence, adtech, media, software, game development
- Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
- Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering


