MasterControl

Senior ML Ops Engineer

Posted 7 Days Ago

Remote

Hiring Remotely in USA

200K-250K

Senior level

Remote

Hiring Remotely in USA

200K-250K

Senior level

As a Senior ML Ops Engineer, you'll automate, monitor, and scale machine learning workloads, manage infrastructure, and collaborate with teams to productionize AI services.

The summary above was generated by AI

About MasterControl

MasterControl Inc. is a leading provider of cloud-based quality and compliance software for life sciences and other regulated industries. Our mission is the same as that of our customers to bring life-changing products to more people sooner. The MasterControl Platform helps organizations digitize, automate and connect quality and compliance processes across the regulated product development life cycle. Over 1,000 companies worldwide rely on MasterControl solutions to achieve new levels of operational excellence across product development, clinical trials, regulatory affairs, quality management, supply chain, manufacturing and postmarket surveillance. For more information, visit www.mastercontrol.com.

Summary

At MasterControl, we’re building our internal AI Platform to power intelligent, scalable, and compliant AI systems in regulated industries. We are seeking an experienced MLOps Engineer with deep infrastructure expertise to help us automate, monitor, and scale machine learning workloads across diverse environments.

This is not just about deploying models. You’ll help define the backbone of our AI pipeline: managing CI/CD, Kubernetes, observability, versioning, orchestration, inference workloads and performance. You’ll work closely with Machine Learning Researchers/Engineers, Data Engineers, and Platform teams to make our AI Services and Products production-ready, resilient, and fast.

What You’ll Do

Design and maintain infrastructure for training, evaluating, and deploying machine learning models at scale.

Manage GPU orchestration on Kubernetes (EKS), including node autoscaling, bin-packing, taints/tolerations, and cost-aware scheduling strategies (e.g., spot/preemptible GPUs).

Build and optimize CI/CD pipelines for ML code, data versioning, and model artifacts using tools like GitHub Actions, Argo Workflows, and Terraform.

Manage and optimize containerized ML workloads on Kubernetes (EKS), including node auto-scaling, GPU orchestration, and runtime scheduling.

Develop and maintain observability for model and pipeline health (e.g., using Prometheus, Grafana, OpenTelemetry).

Collaborate with Data Scientists and ML Engineers to productionize notebooks, pipelines, and models.

Implement and work with security and compliance to bring best practices around model serving and data access

Support inference backends including vLLM, Hugging Face, NVIDIA Triton, and other runtime engines and Optimize GPU utilization

Develop tools to simplify model deployment, rollback, and A/B testing for experimentation and reliability.

Lead incident response and debugging of performance issues in production AI systems.: Good to have

What You’ll Bring

5+ years of experience in MLOps, infrastructure, or platform engineering.

Experience setting up and scaling training and fine-tuning pipelines for ML models in production environments.

Strong expertise in Kubernetes, container orchestration, and cloud-native architecture (AWS preferred), specifically with GPUs.

Hands-on with training frameworks like PyTorch Lightning, Hugging Face Accelerate, or DeepSpeed.

Proficiency in infrastructure-as-code (Terraform, Helm, Kustomize) and cloud platforms (AWS preferred).

Familiar with artifact tracking, experiment management, and model registries (e.g., MLflow, W&B, SageMaker Experiments).

Strong Python engineering skills and experience debugging ML workflows at scale.

Experience deploying and scaling inference workloads using modern ML frameworks: If anyone says, they metion scaling inference, I am very interested

Deep understanding of CI/CD systems and their role in ML production.: Less focus on POCs, at least 1 production project

Working knowledge of monitoring and alerting systems for ML workloads.

A strong sense of ownership and commitment to quality, security, and operational excellence.

Nice to Have

Experience with GPU scheduling and autoscaling in Kubernetes.

Familiarity with model versioning and drift monitoring tools.

Knowledge of low-latency inference optimization (e.g., quantization, FP8, TensorRT).

Experience working in compliance or regulated industries.

Why Work Here?

#WhyWorkAnywhereElse?

MasterControl is a place where Exceptional Teams come together to do their best work. In fact, hiring Exceptional Teams is a core value of ours. MasterControl employees are surrounded by intelligent, motivated, and collaborative individuals. We like to call it #TheBestTeamOnThePlanet.

We work hard to develop and challenge our employees' skillsets, recognize their contributions, encourage professional development, and offer a one-of-a-kind culture. This is why we say #WhyWorkAnywhereElse? MasterControl could be your next (and last) career move!

Here are some of the benefits MasterControl employees enjoy:

Competitive compensation
100% medical premium coverage (yes, you read that right!)
401(k) plan with company match
Generous PTO packages that increase with tenure
Schedule flexibility
Fitness clubs (you get paid to have fun and be active!)
Company parties and employee recognition programs
Wellness programs (free Fitbit, gym membership and athletic shoe reimbursements, etc.)
Onsite physician and massage therapist
Innovation center and gaming rooms at the office
Dental/vision plans
Employer paid life insurance policy
Much, much more!

Applicants must be currently authorized to work in the United States on a full-time basis.

The US base salary range for this full-time position is $200,000-$250,000 + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

MasterControl is an Equal Opportunity Employer. If you are an individual with a disability and require a reasonable accommodation to complete any part of the application process, or are limited in the ability or unable to access or use this online application process and need an alternative method for applying, you may contact [email protected] or call (801) 942-4000 and ask to speak with a member of Human Resources.
Equal Opportunity Employer, including disability and protected veteran status.

Top Skills

Argo Workflows

AWS

Deepspeed

Github Actions

Grafana

Hugging Face

Kubernetes

Mlflow

Nvidia Triton

Opentelemetry

Prometheus

Pytorch Lightning

Sagemaker Experiments

Terraform

W&B

Similar Jobs

Quilter (quilter.ai)

Senior ML Ops Engineer

7 Hours Ago

Remote

United States

Senior level

Artificial Intelligence • Software

As a Senior ML Ops Engineer, you'll build ML infrastructure, implement automated deployment, optimize model serving, and ensure production performance.

Top Skills: AWSAzureCi/CdCudaDockerGCPHelmKubeflowKubernetesMl Pipeline OrchestrationMlflowModel ServingMulti-GpuTensorrtTerraform

Overstory

Senior Machine Learning Ops Engineer

Yesterday

In-Office or Remote

Senior level

Software

The Senior Machine Learning Ops Engineer will design and build machine learning operations infrastructure, manage ML pipelines, experiment tracking, model deployment, and collaborate with engineering teams to enhance data and ML processes.

Top Skills: AirflowGCPKubeflowMlflowVertexai

Dropbox

Program Manager

An Hour Ago

In-Office or Remote

Select, KY, USA

190K-257K Annually

Senior level

190K-257K Annually

Senior level

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy

The Staff Design Program Manager will drive product design initiatives, optimize processes, collaborate with teams, and ensure effective communication and visibility across the design organization.

Top Skills: Figma

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
Key Industries: Artificial intelligence, adtech, media, software, game development
Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering