Mattermost Logo

Mattermost

Lead Site Reliability Engineer

Posted 6 Hours Ago
Be an Early Applicant
Easy Apply
Remote
Hiring Remotely in United States
170K-200K Annually
Senior level
Easy Apply
Remote
Hiring Remotely in United States
170K-200K Annually
Senior level
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
The summary above was generated by AI
At Mattermost, we build the #1 collaborative workflow solution for defense, intelligence, security, and critical infrastructure organizations. Trusted by governments, financial institutions, and technology companies, our platform enables secure, efficient operations for the world’s most critical teams.
 
We’re dedicated to empowering organizations to operate with confidence, reducing risks, and accelerating productivity. Guided by our core values of Customer Obsession, Earn Trust, Self Awareness, Ownership and High Impact, we collaborate closely with our customers to deliver solutions that meet complex needs and drive success.
 
To learn more, visit www.mattermost.com

Mattermost is seeking an experienced and visionary Lead Site Reliability Engineer (SRE) to guide the architecture, reliability, and operational excellence of the infrastructure powering our secure, mission-critical collaboration platform. 

In this role, you will provide technical leadership across our SRE function, driving strategic initiatives for scalability, observability, performance, and automation across cloud and hybrid environments. You will mentor engineers, establish best practices, and collaborate closely with development, security, and operations teams to ensure our customers in defense, government, and critical infrastructure sectors experience exceptional reliability and performance. 

Responsibilities Include:

  • Define the strategy, architecture, and roadmap for Mattermost’s site reliability engineering function, aligning infrastructure initiatives with product and business goals. 
  • Lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD). 
  • Establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale. 
  • Drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements. 
  • Partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements. 
  • Champion automation and operational excellence to improve efficiency, reduce risk, and scale operations. 
  • Oversee cloud cost management and capacity planning to optimize infrastructure spending while meeting performance targets. 
  • Build and maintain a developer platform that enables fast, secure software delivery and improves application stability in production. 
  • Mentor and coach SRE team members, fostering a culture of learning, collaboration, and technical excellence. 

 Requirements:

  • BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles. 
  • Proven expertise in container orchestration platforms, ideally Kubernetes. 
  • Extensive experience with infrastructure-as-code, ideally Terraform. 
  • Strong background in cloud platforms, ideally AWS. 
  • Demonstrated experience designing and implementing monitoring, alerting, and performance optimization strategies. 
  • Exceptional troubleshooting and incident management skills for distributed systems. 
  • Proficiency in at least one scripting or programming language for automation. 
  • Excellent communication skills with a track record of influencing cross-functional teams. 
  • Experience leading globally distributed teams in a remote-first environment. 
  • For candidates residing in the U.S.: This role may require the ability to obtain and maintain a U.S. government security clearance in the future. As such, U.S. applicants must be U.S. citizens and eligible under applicable clearance requirements.  
  • Applicants must meet eligibility requirements for access to export-controlled information as defined by U.S. export control laws, including EAR and ITAR. 

 Preferences:

  • Familiarity with observability stacks such as Grafana and Prometheus. 
  • Experience designing high-availability, disaster recovery, and scaling architectures. 
  • Exposure to GCP and Azure cloud environments. 
  • Leadership experience in highly regulated industries such as defense, finance, or critical infrastructure. 
  • Experience with U.S. federal compliance frameworks and authorization processes, including FedRAMP, DoD ATO, NIST 800-53, and related government standards. 
  • Experience preparing, delivering, and maintaining software offerings through AWS Marketplace and other cloud provider marketplaces (e.g., Azure Marketplace, Google Cloud Marketplace), including packaging, compliance validation, and ongoing operational support. 
  • Open-source contributions in reliability, DevOps, or infrastructure tooling. 
  • Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect). 

Mattermost takes a market-based approach to pay and pay may vary depending on your location. The successful candidate’s starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. These ranges may be modified in the future.

 

Posting Range
$170,000$200,000 USD
Mattermost is an EEO Employer, we are a remote-first, open-source company.
 
We are continually working to expand our hiring in more countries and regions, ensuring compliance with local laws and regulations, which takes time.
 
Mattermost values your unique perspective—we welcome all applicants. We encourage individuals from all backgrounds to apply and are committed to assessing candidates based on their skills and qualifications. We do not tolerate discrimination against staff or applicants based on race, religion, national origin, age, disability, pregnancy status, veteran status, or other personal characteristics.
 
If you require accommodations during the interview process, please let us know—we’re happy to assist.

Top Skills

Kubernetes,Terraform,Aws,Grafana,Prometheus,Gcp,Azure,Aws Marketplace,Azure Marketplace,Google Cloud Marketplace

Similar Jobs

Yesterday
Remote or Hybrid
2 Locations
160K-180K Annually
Expert/Leader
160K-180K Annually
Expert/Leader
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the reliability and scalability of the infrastructure, lead a team in operational execution, ensure best practices in SRE, and mentor senior engineers.
Top Skills: Ci/CdDockerGitopsGoKubernetesLinuxPythonTerraform
7 Days Ago
Remote
3 Locations
103K-191K Annually
Senior level
103K-191K Annually
Senior level
Healthtech
Lead complex projects for platform infrastructure performance, reliability, and security. Drive service automation and incident resolution; mentor teams and ensure optimal system performance.
Top Skills: AnsibleAtlassianAWSAzureBitbucketBmcCloudbeesDynatraceElasticGitlabGrafanaJenkinsLinuxMongoDBMySQLNew RelicOpenshiftOracle DbmsPl SqlPrometheusService NowSplunkSQLSubversionUnixWindows
6 Days Ago
In-Office or Remote
Boston, MA, USA
119K-165K Annually
Senior level
119K-165K Annually
Senior level
Security • Software
The Senior Site Reliability Engineer will support AWS infrastructure, improve automation, manage reliability of SaaS environments, and respond to production incidents while implementing best practices.
Top Skills: AnsibleAWSCloudFormationCloudwatchDatadogGrafanaKubernetesOpensearchPagerdutyPythonTerraform

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account