Cox Enterprises Logo

Cox Enterprises

Sr. Site Reliability Engineer - Incident Response

Posted 7 Days Ago
Be an Early Applicant
Hybrid
Irvine, CA
99K-165K Annually
Senior level
Hybrid
Irvine, CA
99K-165K Annually
Senior level
Responsible for enhancing incident management, troubleshooting during incidents, delivering executive summaries, and analyzing incident response effectiveness while collaborating with engineering teams.
The summary above was generated by AI
The Site Reliability Engineer - Incident Response is a critical enterprise-level role responsible for accelerating incident resolution and enhancing the overall incident management process. This individual partners with engineering teams during active incidents to troubleshoot issues using monitoring and logging tools, and post-incident, delivers executive-level summaries that clearly communicate impact, root cause, and resolution. The SRE - Incident Response also plays a key role in analyzing incident response effectiveness and identifying opportunities for systemic improvements.
Core Competencies and Qualifications:
  • Bachelor's degree in a related discipline and 4 years' experience in a related field. The right candidate could also have a different combination, such as a master's degree and 2 years' experience; a Ph.D. and up to 1 year of experience; or 16 years' experience in a related field.
  • Applicants must currently be authorized to work in the United States for any employer without current or future sponsorship. No OPT, CPT, STEM/OPT or visa sponsorship now or in future.
  • Engineering/Tooling: Demonstrates the ability to design, build, and maintain engineering solutions and tools that enhance reliability, automate incident response, and reduce operational toil.
  • Incident Troubleshooting: Skilled in interpreting logs, metrics, and traces to assist in identifying root causes during live incidents.
  • Monitoring & Observability: Proficient in tools such as Datadog, Splunk, New Relic, or similar platforms.
  • AI Centric Engineering: Effectively leverages artificial intelligence (AI) and machine learning (ML) tools to automate, optimize, and enhance daily engineering and incident response tasks
  • Executive Communication: Ability to distill complex technical issues into concise, business-relevant summaries for senior leadership.
  • Analytical Rigor: Strong attention to detail in validating incident data and identifying trends or gaps in response.
  • DevOps & Architecture Knowledge: Understanding full-stack systems, CI/CD pipelines, caching, scaling, and cloud-native infrastructure.
  • Metrics & Reporting: Capable of calculating and interpreting key metrics like MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve).

Here are the responsibilities of this role when not tied to active on-call:
Post-Incident Review Development
  • Draft and deliver executive summaries post-incident
  • Develop and coach teams on blameless postmortems.
  • Create templates, train facilitators, and help guide root cause analysis (e.g., 5 Whys, fishbone diagrams).
  • Maintain a central library of learnings and cross-cutting themes.

Incident Process Improvement
  • Actively support engineering teams during incidents by helping diagnose and resolve issues quickly
  • Navigate and analyze data from observability platforms to make informed inferences about root causes
  • Analyze the effectiveness of incident response to identify systemic reliability gaps.
  • Standardize incident response workflows (incident roles, comms, escalation paths).
  • Create or refine runbooks, incident command frameworks, and severity classification guides.

Metrics and Insights
  • Build dashboards around incident frequency, MTTR, MTTA, and recurrence rates.
  • Use incident data to drive reliability of OKRs or engineering investments.

Tooling & AI Solutions
  • Partner with engineering teams to identify repetitive or high-impact tasks suitable for automation.
  • Develop, implement, and continuously improve custom scripts, bots, and AI-driven workflows for monitoring, alerting, and incident triage.
  • Evaluate and integrate emerging AI/ML technologies to optimize detection, root cause analysis, and reporting.
  • Ensure all tools and automations are secure, maintainable, and aligned with organizational standards and SRE best practices.
  • Document and socialize new tools and AI solutions, enabling adoption and knowledge sharing across teams.

Cross-Team Collaboration
  • Collaborate with Engineering Managers and Incident Commanders to gather and validate incident data
  • Partner with product teams, infra, and leadership to socialize reliability best practices.
  • Act as a reliability "consultant" to squads that have impactful incidents.
  • Recommend enhancements to monitoring, alerting, and response processes to reduce future incident impact

USD 99,000.00 - 165,000.00 per year
Compensation:
Compensation includes a base salary of $99,000.00 - $165,000.00. The base salary may vary within the anticipated base pay range based on factors such as the ultimate location of the position and the selected candidate's knowledge, skills, and abilities. Position may be eligible for additional compensation that may include an incentive program.
Benefits:
The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company's needs, and its obligations; seven paid holidays throughout the calendar year; and up to 160 hours of paid wellness annually for their own wellness or that of family members. Employees are also eligible for additional paid time off in the form of bereavement leave, time off to vote, jury duty leave, volunteer time off, military leave, and parental leave.

Top Skills

AI
Datadog
Ml
New Relic
Splunk

Cox Enterprises Foothill Ranch, California, USA Office

Foothill Ranch, CA, United States

Similar Jobs at Cox Enterprises

Yesterday
Remote or Hybrid
California, USA
105K-157K Annually
Mid level
105K-157K Annually
Mid level
Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
The Portfolio Manager oversees a portfolio of dealer clients, optimizing credit use, mitigating risks, collecting payments, and ensuring compliance while building strong client relationships.
Top Skills: ExcelOutlookPowerPointSalesforceTeamsWord
Yesterday
Hybrid
Irvine, CA, USA
109K-181K Annually
Senior level
109K-181K Annually
Senior level
Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
The Senior Manager oversees the contract lifecycle for a sales team, including contract reviews, negotiations, and process improvements. They collaborate with stakeholders, ensure compliance with company standards, and train new managers.
Yesterday
Hybrid
Rolling Hills Estates, CA, USA
25-37 Hourly
Mid level
25-37 Hourly
Mid level
Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
As an Assistant Store Manager, you'll lead a sales team, manage store performance, provide training, oversee inventory, and handle customer issues.

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account