High Performance Computing Engineering Manager at University of Southern California
The candidate for the position of HPC Engineering Manager must meet the following qualifications:
- Bachelor’s degree in computer science, information systems, or related field, or equivalent combination of education, training, and experience.
- Eight years of experience in information technology, high-performance computing, or other related fields.
- Strong scripting ability (Bash, Perl, Python, etc.) and experience with programming fundamentals.
- Experience presenting technical topics in a business-oriented fashion to non-technical audiences.
- Expertise with multivendor hardware/software management, security, and network/Internet protocols.
- Expertise with administration and monitoring and maintaining secure Linux/Unix operating systems in HPC environment.
- Expertise with HPC system software cluster management/provisioning tools (Lmod/Module system, Ansible/Salt, Warewulf/xCAT) and job schedulers (Slurm).
- Proficiency with low-latency/high-bandwidth interconnect infrastructure (Infiniband, 10/100GigE).
- Ability to troubleshoot network issues related to infrastructure.
- Proficiency with shared memory and distributed memory parallelism (OpenMP, MPI) and accelerators (GPUs).
- Familiarity with leading cloud computing services and container technologies.
- Ability to identify and resolve problems and manage performance.
- Demonstrated expertise in system design and configuration and HPC system acquisition planning.
- Establishing processes for maintaining system performance and managing best-in-class standards.
- Ability to develop positive working relationships and a strong rapport with team members.
- Ability to provide both detailed information and high-level summaries to management-level individuals and groups.
The ideal candidate for the position of HPC Engineering Manager has the following qualifications:
- Bachelor’s degree in computer science, computer information systems, or related field.
- Expert knowledge of HPC systems, parallel file systems, high-performance network, and SW stack management.
- Experiences in cloud computing services (AWS, GCP, Azure)
- Demonstrated expertise with container technologies (Docker, Singularity, Mesos, etc.)
- More than ten years of experience in one of the following fields: information technology, or high-performance computing.
- Three or more years of experience in a management or leadership role.
- Ability to drive technical leadership and management of complex large-scale computing systems projects.
- Excellent organizational and verbal and written communication skills.
- Persuasive and effective communicator with the ability to interact with a wide variety of stakeholders and with experience presenting the business side of technical topics to non-technical audiences
THE WORK YOU WILL DO
The HPC EngineeringManager reports to the Director of the Center for High Performance Computing at USC. The incumbent leads and oversees the HPC engineering functions in design, development, installation, and maintenance of hardware and software for the HPC infrastructure according to stakeholders’ needs and the university’s strategic vision. The HPC Engineering Manager is responsible for system design and planning, implementation, performance improvement, security, and maintenance of high-performance computing infrastructure. In addition to technical duties, the HPC Engineering Manager builds, manages, and mentors an extremely talented engineering team to deliver innovative solutions in High-Performance Computing for the research community at USC. As a member of ITS, the HPC Engineering Manager is expected to model and cultivate ITS’s cultural values and behaviors within his or her team.
The HPC Engineering Manager:
- Architects, evaluates, designs, tests, and supports the deployment of high-performance technology in USC's HPC environment. Adopts appropriate leading practices and leverages next-generation advancements. Determines the reasonable balance between performance, reliability, and cost.
- Provides guidance and assistance to the HPC team for system performance optimization and troubleshooting. Mentors others based on experience and knowledge of computer organization, storage systems, network, software stack, and operating systems in HPC environment.
- Supervises the HPC private network infrastructure including InfiniBand interconnects, fiber channel and Ethernet switches, network protocol services, and access protection. Collaborates with supporting ITS teams to achieve optimal performance and connectivity for global access by HPC researchers.
- Understands researcher needs as well as current and future requirements by building and maintaining strong relationships with key researchers, vendors, stakeholders, and academic units across all USC campuses.
- Works closely with ITS leadership to identify, implement, and support cost-effective, leading solutions for HPC engineering by maintaining currency with industry standards, supporting process optimization, providing input to department budget planning, and monitoring and managing resources.
- Improves the integrity of systems, networks, and services by applying leading-edge technical and operational knowledge to configure and maintain high-performance computing server platforms. Drives server hardware and software life cycle management by helping to plan, develop, and deploy maintenance fixes to system; develop test plans for implementing new high-performance computing software.
- Develops plans for new high-performance computing system and application implementations, custom scripts, and testing procedures to ensure effective support and operational reliability for the research enterprise at USC; trains technical ITS organization staff in use of new software and hardware developed and/or acquired.
- Manages compliance by defining standards of service and establishing policies and procedures to guide the engineering team in day-to-day operations and strategic planning initiatives.
- Manages the development of team members by helping them set and achieve goals for their career growth. Fosters an inclusive environment that values differences and creates a sense of belonging and appreciation for team members. Contributes to a culture of trust and transparency.
- Contributes in cross-functional coordination, architecture discussions, and priority planning in a highly collaborative environment.
- Performs other related duties as assigned or requested. The university reserves the right to add or change duties at any time.