Graphcore Logo

Graphcore

AI HW Systems Engineering and Debug Lead

Posted Yesterday
Be an Early Applicant
Hybrid
Austin, TX
Expert/Leader
Hybrid
Austin, TX
Expert/Leader
Lead system-level debug and root cause analysis for AI data center platforms. Develop debug methodologies and coordinate cross-functional teams to resolve issues.
The summary above was generated by AI

About us 

Graphcore is one of the world’s leading innovators in Artificial Intelligence compute. It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry. 

As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone. 

Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation. 

Job Summary 

We are seeking an experienced AI HW Systems Engineering and Debug Lead to drive system-level debug and bring-up activities for Graphcore’s next-generation AI data center platforms. 

The successful candidate will lead complex debug efforts across hardware, firmware, and software layers for blade and rack-level systems. This role focuses on developing scalable debug strategies, improving debug throughput, and ensuring timely resolution of system-level issues throughout the product lifecycle. 

The Team 

Graphcore is a globally recognised leader in Artificial Intelligence computing systems. The company designs advanced semiconductors and data centre hardware that provide the specialised processing power needed to drive AI innovation, while delivering the efficiency required to support its broader adoption 

The Systems Engineering and Validation team ensures Graphcore’s AI compute platforms are fully validated, debugged, and ready for deployment in hyperscale data center environments. 

The team collaborates closely with silicon engineering, system architecture, firmware, operating system, and rack integration teams to identify and resolve system-level issues and drive improvements in validation and debug methodologies. 

Responsibilities and Duties 

  • Own and develop AI systems debug methodology and system bring-up strategies for next-generation AI data center platforms. 
  • Lead system-level debug and root cause analysis for issues identified during server rack validation, post-silicon validation, and production phases. 
  • Drive complex debug efforts across silicon, hardware platforms, firmware, operating systems, and software stacks. 
  • Manage and track technical issues, risks, and priorities to ensure program milestones are achieved. 
  • Publish debug program indicators and metrics to identify roadblocks and improve debug throughput. 
  • Coordinate cross-functional teams including system architecture, silicon, firmware, and validation teams to resolve system-level issues. 
  • Lead development and integration of debug tools, scripts, and methodologies to improve debug efficiency. 
  • Communicate program status, risks, and technical findings to engineering leadership and stakeholders. 

Candidate Profile 

Essential 

  • Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or related discipline. 
  • 15+ years of experience working on complex systems engineering challenges involving HW/FW/SW debug in server or data center environments. 
  • Proven experience leading validation and debug for board, blade, and rack-level hardware platforms. 
  • Strong experience debugging OS, firmware, silicon, and hardware issues. 
  • Understanding of industry-standard system buses such as PCIe and CXL and their software stacks. 
  • Strong knowledge of ARM or x86 CPU architectures, SoC design, memory systems, and power management. 
  • Experience with system architecture, validation strategies, and complex system debug methodologies. 
  • Strong collaboration, communication, and cross-team coordination skills. 

Desirable 

  • Experience designing or deploying AI/ML rack-scale systems. 
  • Experience developing at-scale debug methodologies for hyperscale data center systems. 
  • Familiarity with data center infrastructure and emerging AI hardware technologies. 
  • Experience with rack integration testing and hyperscale deployment readiness. 
  • Knowledge of automated validation frameworks, test analytics, and continuous validation practices. 

Top Skills

Ai Systems
Arm
Cxl
Firmware
Hardware
Pcie
Soc Design
Software
X86

Similar Jobs at Graphcore

Yesterday
Hybrid
Austin, TX, USA
Senior level
Senior level
Artificial Intelligence • Semiconductor
Lead validation and quality assurance for firmware stacks on ARM-based servers, including security, functionality, and reliability testing.
Top Skills: ArmEdk IiGdbGpioI2CI3CIpmiJtagLogic AnalyzersMctpOpenbmcPciePldmProtocol AnalyzersRedfishSmbusSpiUartUefi
Yesterday
Hybrid
2 Locations
Expert/Leader
Expert/Leader
Artificial Intelligence • Semiconductor
Lead the architecture and development of OpenBMC firmware for AI server platforms, enabling hardware integration, developing security capabilities, and collaborating with teams for reliable firmware delivery.
Top Skills: BashBitbakeCC++Ci/CdD-BusGdbI3CI²CJtagMctpOpenbmcPciePldmPythonRedfishSpiYocto
Yesterday
Hybrid
2 Locations
Senior level
Senior level
Artificial Intelligence • Semiconductor
Lead architecture and development of OpenBMC firmware for AI infrastructure, collaborating with partners on reliability, scalability, and serviceability.
Top Skills: BashCC++Ci/CdDcmiI2CI3CIpmiLinuxMctpNc-SiOpenbmcPciePldmPmciPythonRedfishSgpioSpiUartUsbYocto

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account