Site Reliability Engineering Manager

CCC Intelligent Solutions

Sorry, this job was removed at 3:47 p.m. (PST) on Wednesday, April 1, 2020

View 1186 Jobs

Find out who's hiring in Greater LA Area.

See all Developer + Engineer jobs in Greater LA Area

View 1186 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Job Description Summary

We are seeking a talented Site Reliability Engineering Manager to be part of the fast moving, innovative CCC ONE product team. We build enterprise class, hosted solutions that span multiple data centers and public clouds and service hundreds of thousands of end-users. This is a great opportunity for a highly-motivated person interested in leading a team responsible for the overall health, performance and operational design of our systems and applications. In this position, you will work alongside DevOps, software developers, database administrators, network engineers, systems engineers and information security, in an agile environment building modern software solutions and infrastructure.

Job Duties

Responsibilities

Serve as a hands-on manager of a team of software/system engineers
Own end-to-end availability and performance of key systems and services
Triage potential application issues received through various channels and work with appropriate teams to lead to resolution
Lead by example, mentor the team and establish credibility through quality technical execution
Gain and disseminate knowledge of our complex applications
Application metrics and operational intelligence
Manage on-call rotations with support from development teams

Qualifications

Qualifications

2+ years management experience leading an engineering team with technical deep-dives into code, networking, operating systems and/or storage
5+ years working in an Agile/Scrum development methodology
5+ years work experience using Microsoft technologies, preferably .NET and C# focused
Proven ability in designing and configuring monitoring and alerting solutions across multiple systems and services using tools such as Prometheus, Grafana, Kibana and/or Application Insights
Experience preparing and presenting operational artifacts to senior management
Experience with DevOps tools (Azure DevOps, Puppet, Chef or Ansible), processes and culture with a focus on automation
Working knowledge of databases including SQL, indexing and schema design
Familiarity with technical considerations involved in designing for complex systems at large scale
Production troubleshooting skills that span systems, networks and code
Desire to build, grow and improve a team
Ability to encourage and foster a culture of visibility and transparency across teams

Other Beneficial Skills

Experience with Microsoft Azure and/or AWS
Experience with Kubernetes and/or Azure Service Fabric
Technical knowledge of SQL Server internals with emphasis on query performance
Experience with queuing frameworks and message brokers

Read Full Job Description

Site Reliability Engineering Manager

Job Description Summary

Job Duties

Qualifications

Location

Similar Jobs