Own and govern end-to-end Problem Management per ITIL. Lead root cause investigations and RCA sessions, maintain Known Error Database, track corrective actions, analyze incident/change data, create KPIs and executive dashboards, and drive cross-functional remediation with Infrastructure, Network, Cloud, Security, EUC, and Application teams to improve service reliability.
ITIL Problem Management Analyst
Location: Irvine, CA (5 days onsite) 92614
Duration: 6 Months
Position Summary
We are seeking an experienced Problem Management Lead Analyst to drive service stability and continuous improvement across IT Operations. This role will own the end-to-end Problem Management process, lead root cause investigations, identify systemic issues, and partner with technology teams to eliminate recurring incidents and improve overall service reliability.
Key Responsibilities
Problem Management Leadership
Location: Irvine, CA (5 days onsite) 92614
Duration: 6 Months
Position Summary
We are seeking an experienced Problem Management Lead Analyst to drive service stability and continuous improvement across IT Operations. This role will own the end-to-end Problem Management process, lead root cause investigations, identify systemic issues, and partner with technology teams to eliminate recurring incidents and improve overall service reliability.
Key Responsibilities
Problem Management Leadership
- Own and govern the end-to-end Problem Management lifecycle in alignment with ITIL best practices.
- Lead proactive and reactive problem investigations to identify underlying causes of recurring incidents and service disruptions.
- Facilitate Root Cause Analysis (RCA) sessions using structured methodologies such as 5 Whys, Fishbone Analysis, Fault Tree Analysis, and Kepner-Tregoe.
- Establish and maintain a Known Error Database (KEDB), ensuring accurate documentation of known errors and workarounds.
- Track corrective and preventive actions through resolution and verify effectiveness of implemented fixes.
- Drive accountability across Infrastructure, Network, Cloud, Security, End User Computing, and Application teams to resolve systemic issues.
- Analyze incident, change, and operational data to identify trends, recurring issues, and opportunities for service improvement.
- Develop and present actionable recommendations to improve platform stability, reduce incident volumes, and enhance service performance.
- Lead recurring service review meetings focused on problem trends, chronic issues, and risk mitigation.
- Identify automation opportunities and process improvements that reduce operational effort and prevent recurring incidents.
- Contribute to operational excellence initiatives, knowledge management, and runbook enhancements.
- Utilize ServiceNow Problem Management capabilities to manage problem records, known errors, corrective actions, and reporting.
- Establish KPIs and metrics related to problem management effectiveness, including recurring incident reduction, RCA completion, and corrective action closure.
- Create executive-level dashboards and reports highlighting service health trends, top recurring issues, and improvement initiatives.
- Ensure compliance with ITIL processes, documentation standards, and audit requirements.
- Partner with Major Incident Management teams to ensure high-priority incidents are transitioned into formal problem investigations when appropriate.
- Lead Post-Incident Reviews (PIRs) focused on identifying root causes and preventive actions.
- Collaborate with Change Management teams to ensure corrective actions are properly planned, tested, and implemented.
- Assess risks associated with recurring issues and provide recommendations for long-term remediation.
- 5+ years of experience in IT Operations with at least 3 years focused on Problem Management, Service Reliability, or IT Service Management.
- ITIL Foundation certification required; ITIL Managing Professional or Advanced certifications preferred.
- Strong hands-on experience with ServiceNow Problem Management, Incident Management, and reporting modules.
- Proven experience conducting complex Root Cause Analysis and facilitating cross-functional problem review sessions.
- Strong understanding of enterprise IT infrastructure including Servers, Cloud, Network, End User Computing, and Applications.
- Experience developing metrics, dashboards, and executive reporting.
- Excellent facilitation, communication, and stakeholder management skills.
- Ability to influence technical teams and drive resolution of long-standing operational issues.
- Experience implementing Problem Management programs or maturing ITSM processes.
- Familiarity with SRE, Reliability Engineering, or Operational Excellence frameworks.
- Experience with Power BI, Tableau, or ServiceNow Performance Analytics.
- Knowledge of automation platforms and operational process optimization.
Similar Jobs
Aerospace • Artificial Intelligence • Hardware • Machine Learning • Software • Defense • Manufacturing
As a Senior Flight Software Engineer, you will develop software for spacecraft systems, integrating algorithms, and maintaining databases essential for space missions. You'll work through design to flight implementation, collaborating on software and hardware simulations.
Top Skills:
Azure RtosC/C++CanEthernetI2CPythonRs422Rs485RtlinuxRtosSpiTcp/IpUdp
Cloud • Information Technology • Machine Learning
Lead cross-functional programs to validate and optimize AI/ML infrastructure performance. Drive benchmarking, observability, hardware bring-up, release readiness, and measurable metrics across GPU-based clusters. Coordinate engineering, infrastructure, product, capacity, and go-to-market teams to operationalize benchmarking frameworks, prioritize performance work, and ensure platforms meet stability and performance standards for training and inference workloads.
Top Skills:
AcceleratorsBenchmarking FrameworksDistributed SystemsGpuGpu Cluster ArchitectureObservability Tools
Cloud • Information Technology • Security • Software • Cybersecurity
Design and implement scalable, low-latency AI security infrastructure and high-performance networking code in Rust. Collaborate with product and cross-functional teams, debug complex network/system issues, and deliver secure, testable solutions for large-scale cloud security platform.
Top Skills:
Ai ToolsCC++DnsFirewallsHttp/3LlmsMultithreadingProxiesQuicRustSystem ApisTcp/IpUdpVirtual MemoryVpns
What you need to know about the Los Angeles Tech Scene
Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.
Key Facts About Los Angeles Tech
- Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
- Key Industries: Artificial intelligence, adtech, media, software, game development
- Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
- Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering



