Incident and Problem Manager
Sorry, this job was removed at 3:17 a.m. (PST) on Friday, August 20, 2021
By clicking Apply Now you agree to share your profile information with the hiring company.
Responsible for managing production incidents and outage events within the Information Technology division. Provides leadership and coordination across infrastructure, application and partner teams to quickly remediate production issues and reduce mean time to resolution. Ensures appropriate managerial relationships are established and maintained to build and strengthen trust regarding incident management resolution and serves as a focal point for escalation of issues to be resolved. Facilitate ITIL standards adherence.
Job Description
- Manage incidents and outages
- Manage the review, assignment and classifications of incidents, outages and problem cases
- Actively engage with operations teams and engineers, and manage the involvement of application development and other areas in the change and problem management process
- Create and review incident and problem management reports and identify action plans to improve key performance indicators as necessary
- Introduces key ITIL disciplines and practical project management techniques to programs
- Ensure proper usage of incident, outage, problem and change management systems and processes
- Perform quality assurance on completed incident, outage, problem investigations and change management records
- Conduct Root Cause Analysis (RCA), Port Mortem and Problem Management meetings
- Define reporting requirements needed in the management of the incident, outage and problem management processes
- Review incident, outage and problem processes, identify trends and recommend improvements
- Make recommendations for resolution and improvements to mitigate risk and prevent the replication of problems across systems
- Perform other related duties as required and assigned
- Demonstrate behaviors which are aligned with the organization’s desired culture and values
Ideal Candidate will have the following:
- ITIL framework certification
- Strong analytical and project management skills
- Ability to manage an incident/outage bridge with 50+ technical and business stakeholders
- Ability to manage competing priorities and operate under pressure
- Ability to adjust schedule based on business need
- Ability to be proactive, takes action and anticipates opportunities
- Ability to guide and assist in technical troubleshooting during an incident/outage
- Excellent management, interpersonal, communication, presentation, and organizational skills
- The ability to lead cross functional teams effectively at all levels of the organization
- Coordination skills: managing (complex) IT technical investigations
- Familiarity with New Relic, SumoLogic, Opsview, CloudWatch or other monitoring/logging tools used in the troubleshooting, identification and resolution of an issue
- Advanced knowledge of incident, outage, problem and change management
- Experience managing 24/7 Application, Infrastructure and/or Operation teams preferred
- Experience supporting Application and Infrastructure in AWS preferred
- Financial Services and, if possible, mortgage industry experience preferred
- Strong business acumen and ability to interface with executive management
Read Full Job Description