Sully.ai Logo

Sully.ai

Senior AI Systems Engineer (LLM Inference & Infra Optimization)

Posted 2 Days Ago
Remote
Hiring Remotely in US
Senior level
Remote
Hiring Remotely in US
Senior level
Lead efforts in deploying and optimizing large language models on GPU hardware, optimizing inference pipelines and managing multi-cloud infrastructures.
The summary above was generated by AI
About Us

At Sully.ai, we’re building cutting-edge AI-native infrastructure to power real-time, intelligent healthcare applications. Our team operates at the intersection of high-performance computing, ML systems, and cloud infrastructure — optimizing inference pipelines to support next-generation multimodal AI agents. We're looking for a deeply technical engineer who thrives at the systems level and loves building performant, scalable infrastructure.

The Role

We’re looking for a senior-level engineer to lead efforts in deploying and optimizing large language models on high-end GPU hardware and building the infrastructure that supports them. You'll work across the stack — from C++ and CUDA kernels to Python APIs — while also shaping our DevOps practices for scalable, multi-cloud deployments. This role blends systems performance, ML inference, and infrastructure-as-code to deliver low-latency, production-grade AI services.

What You’ll Do
  • LLM Inference Optimization: Develop and optimize inference pipelines using quantization, attention caching, speculative decoding, and memory-efficient serving.

  • Systems Programming: Build and maintain low-level modules in C++/CUDA/NCCL to squeeze the most out of GPUs and high-throughput architectures.

  • DevOps & Infrastructure Engineering: Stand up and manage multi-cloud environments using modern IaC frameworks such as Pulumi or Terraform. Automate infrastructure provisioning, deployment pipelines, and GPU fleet orchestration.

  • Real-Time Architectures: Design low-latency streaming and decision-support systems leveraging embedding models, VRAM token caches, and fast interconnects.

  • Developer Enablement: Build robust tooling, interfaces, and sandbox environments so that other engineers can contribute safely to the ML systems layer.

What We’re Looking For
  • Proficiency in C++, CUDA, and Python with experience in systems or ML infrastructure engineering.

  • Deep understanding of GPU architectures, inference optimization, and large model serving techniques.

  • Hands-on experience with multi-cloud environments (GCP, AWS, etc.) and infrastructure-as-code tools such as Pulumi, Terraform, or similar.

  • Familiarity with ML deployment frameworks (TensorRT, vLLM, DeepSpeed, Hugging Face Transformers, etc.).

  • Comfortable with DevOps workflows, containerization (Docker), CI/CD, and distributed system debugging.

  • (Bonus) Experience with streaming embeddings, semantic search, or hybrid retrieval architectures.

  • (Bonus) Interest in building tools that democratize high-performance systems for broader engineering teams.

Why Join Us
  • Collaborate with a highly technical team solving hard problems at the edge of AI and healthcare.

  • Work with bleeding-edge GPU infrastructure and build systems that push what's possible.

  • Be a foundational part of shaping AI-native infrastructure for real-time, mission-critical applications.

  • Help accelerate a meaningful product that improves how clinicians work and patients are cared for.

Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment. 

Top Skills

C++
Cuda
Deepspeed
Docker
Hugging Face Transformers
Pulumi
Python
Tensorrt
Terraform
Vllm

Similar Jobs

16 Minutes Ago
Remote or Hybrid
United States
88K-160K Annually
Senior level
88K-160K Annually
Senior level
Cloud • Fintech • Software • Business Intelligence • Consulting • Financial Services
Manage a remote accounting team for franchise clients, oversee financial reporting, and provide advisory services. Requires strong relationship management and technical accounting skills.
Top Skills: IntacctMicrosoft Business CentralNetSuite
16 Minutes Ago
Remote or Hybrid
Milwaukee, WI, USA
Internship
Internship
Cloud • Fintech • Software • Business Intelligence • Consulting • Financial Services
Assist in financial analysis and valuations, conduct research, prepare reports and presentations, and support client engagement activities.
Top Skills: Accounting SoftwareMicrosoft Office Suite
An Hour Ago
Remote or Hybrid
6 Locations
Senior level
Senior level
Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
The role drives the implementation of IL6S, builds capabilities, leads loss analyses, and enhances manufacturing performance through training and coaching.
Top Skills: Integrated Lean Six SigmaTpm (Total Productive Maintenance)

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account