TrueML Jobs

Senior Manager, DevOps

TrueML

Senior Manager, DevOps

Reposted 5 Days Ago

In-Office or Remote

Hiring Remotely in San Francisco, CA

170K-220K Annually

Senior level

In-Office or Remote

Hiring Remotely in San Francisco, CA

170K-220K Annually

Senior level

Lead infrastructure and platform engineering for cloud architecture, CI/CD standards, and scalability of machine learning products, while managing a team of DevOps engineers.

The summary above was generated by AI

Why TrueML?

TrueML is a mission-driven financial software company that aims to create better customer experiences for distressed borrowers. Consumers today want personal, digital-first experiences that align with their lifestyles, especially when it comes to managing finances. TrueML’s approach uses machine learning to engage each customer digitally and adjust strategies in real time in response to their interactions.

The TrueML team includes inspired data scientists, financial services industry experts and customer experience fanatics building technology to serve people in a way that recognizes their unique needs and preferences as human beings and endeavoring toward ensuring nobody gets locked out of the financial system.

TrueML Products is seeking a highly experienced and strategic Sr. Manager, DevOps to lead our infrastructure and platform engineering efforts. This role is critical in driving our cloud architecture strategy, establishing elite CI/CD standards, and ensuring the scalability and reliability of our machine learning-driven products.

Reporting to the Sr. Director, Program & Operations, you will lead the evolution of our internal developer platform and infrastructure-as-code (IaC) architecture. The ideal candidate is a hands-on leader with a "systems-thinking" mindset. We are looking for a visionary who thrives on solving complex distributed systems challenges and considers leveraging GenAI and AIOps tooling second-nature for optimizing system performance and automation.

What You'll Do (Technical Leadership & Strategy):

Define and execute the long-term strategic vision for Infrastructure as Code (IaC), CI/CD evolution, and cloud-native architecture to support TrueML’s scaling needs.
Lead the design and implementation of self-service internal platforms to reduce developer cognitive load, enabling feature teams to deploy and manage services with minimal friction at increased velocity.
Act as the primary stakeholder for cloud spend (AWS); drive cost-optimization initiatives and lead contract negotiations for the DevOps toolstack and third-party vendors.
Ensure the infrastructure architecture supports strict High Availability (HA) requirements and robust Disaster Recovery (DR) protocols, maintaining system integrity across multiple regions.
Oversee the implementation and evolution of comprehensive monitoring, logging, and distributed tracing systems, leveraging AIOps to move from reactive to predictive system maintenance.
Champion security by design by integrating automated vulnerability scanning, secret management, and compliance checks directly into the automated build pipelines.
Serve as the ultimate escalation point for major production outages, facilitating blameless post-mortem reviews that focus on systemic improvements rather than individual error.
Maintain deep technical currency in container orchestration (Kubernetes), serverless patterns, and modern automation frameworks to provide meaningful mentorship and architectural guidance to senior engineering staff.

What You'll Do (Hands-On Engineering & Technical Execution):

Maintain the ability to write and review high-quality code in languages like Python, Go, or Bash to automate complex operational tasks and system integrations.
Hands-on development of Terraform Infrastructure as Code for resource provisioning.
Directly architect and troubleshoot complex CI/CD workflows (GitHub Actions, ArgoCD, Atlantis), ensuring build-and-deploy cycles are optimized for speed and reliability.
Proactively manage and tune container orchestration environments, including hands-on configuration of Ingress controllers, declarative GitOps workflows, and cluster autoscaling.
Lead from the front during critical incidents by conducting deep-dive technical analysis across the EKS stack, troubleshooting Node-level kernel panics, VPC CNI networking bottlenecks, and RDS performance constraints to minimize MTTR
Conduct hands-on audits of cloud configurations and IAM policies, implementing "least privilege" access controls and automated remediation scripts.
Directly manage the integration and API configurations between various tools in the DevOps stack (e.g., connecting Jira, VictorOps, Slack, and Observe for seamless incident flow).

What You'll Do (People Leadership & Engineering Collaboration):

Recruit, hire, and develop a world-class team of DevOps Engineers; provide career pathing and technical mentorship to foster a culture of continuous learning.
Partner closely with Engineering Managers to align infrastructure deliverables with product roadmap, ensuring DevOps is an accelerator rather than a bottleneck.
Collaborate with the Quality Engineering and Security leadership to define and enforce "Definition of Done" standards that include automated testing and security gates.
Set clear, measurable goals (KPIs and OKRs) for the team, conducting regular performance reviews and providing feedback to drive individual and collective excellence.
Lead internal Brunch & Learns to educate the broader engineering organization on modern cloud-native patterns and self-service capabilities.

Who You Are (Qualifications):

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
10+ years of experience in DevOps, Site Reliability Engineering (SRE), or Software Engineering; 5+ years of experience managing engineers
Expert-level mastery with AWS and experience managing multi-region, high-availability deployments
Advanced experience with Kubernetes (K8s) and Docker, including cluster management, networking, and scaling in a production environment.
Proficiency in Terraform to drive consistency and automation across all infrastructure layers. Experience with Atlantis is a plus.
Deep experience designing and maintaining complex pipelines (GitHub Actions, GitLab CI, or Jenkins) and mastery of scripting languages like Python, Go, or Bash.
Hands-on experience with modern monitoring, observability, and tracing stacks (Datadog, Observe) and a firm grasp of SRE principles (SLIs/SLOs/Error Budgets).
Experience acting as an Incident Commander for high-severity outages and fostering a "blameless" post-mortem culture.
Demonstrated ability to influence executive leadership and collaborate cross-functionally with Product, Engineering, and Security teams.
Experience integrating AI-assisted productivity tools (Cline, GitHub Copilot) into the engineering workflow to accelerate delivery.

Ways to "Stand Out":

Experience leading organizational platform migration, including the development of rollback strategies, stakeholder communication plans, and post-migration validation
Prior experience working with high-velocity, product-driven early-to-mid stage technology companies where reliability, extensibility, and availability were mission-critical to success
AWS or Kubernetes Certifications a plus -- but not in lieu of hands-on experience with the same within production environments
Notable contributions to Open Source projects or communities

What We Offer (Perks & Benefits)

Flexible vacation
Medical/dental/vision insurance
Traditional/Roth retirement savings options
Company-paid disability and life insurance
Flexible Spending Account & Limited FSA
Family-friendly parental leave, volunteer and voting time off
On-demand wellness platform access for you and 5 friends and family
PerkSpot discount program for 900+ merchants nationwide

Remote Work, Travel Expectations & Physical Requirements:

This role supports a global, cross-functional business and operates primarily in a Remote-First environment. However, flexibility outside of standard business hours and occasional local or international travel may be necessary for global operations support, company meetings, training, offsites, and collaborative projects.

This position primarily involves computer-based work, requiring extended periods at a computer, participation in virtual meetings, and use of standard office technology. We will consider reasonable accommodations to enable individuals to perform the essential functions of the role.

Maintaining a reliable internet connection and a professional work environment is expected. The ability to protect confidential company, employee, customer, and business information while working outside of a company office is also required.

Personally Identifying Information

We collect personal information for employment purposes. We do not sell personal information. Most of the information we have is provided to us by you and/or collected as part of the employment process. For more details on how we use, share, and delete personal information see our Privacy Policy.

Dedication to Diversity & Inclusion

We are an equal opportunity employer. We promote, value, and thrive with a diverse and inclusive team. Different perspectives contribute to better solutions and this makes us stronger every day. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or other protected characteristics.

Similar Jobs at TrueML

TrueML

Sales Representative

Yesterday

Remote

United States

112K-135K Annually

Mid level

112K-135K Annually

Mid level

Fintech • Machine Learning • Payments • Social Impact • Software • Financial Services

Full sales-cycle responsibility for TrueML's AI-powered SaaS for collections teams: prospecting, discovery, demos, cold calling, and closing. Achieve quota targets, manage complex enterprise deals, engage Director/VP/C-level stakeholders, maintain Salesforce hygiene, and follow the company sales process to progress and win high-value opportunities.

Top Skills: AISaaSSalesforce

TrueML

Application Security Engineer

4 Days Ago

Remote

United States

125K-140K Annually

Senior level

125K-140K Annually

Senior level

Fintech • Machine Learning • Payments • Social Impact • Software • Financial Services

Lead application security across the SDLC: integrate security into development and DevOps, manage vulnerabilities, implement AWS and cloud controls, perform threat modeling and incident response, enforce compliance (OWASP, NIST, ISO), and provide security training and continuous improvement.

Top Skills: AWSCi/CdDastDockerGoIamInfrastructure As Code (Iac)Iso 27001JwtKubernetesNistOauthOwasp Top TenPythonSAMLSastSecurity GroupsSIEMVpcWaf

TrueML

Manager, Platform Enablement

4 Days Ago

Remote

United States

115K-140K Annually

Senior level

115K-140K Annually

Senior level

Fintech • Machine Learning • Payments • Social Impact • Software • Financial Services

Lead and develop a Platform Enablement team that bridges client-facing organizations and Engineering. Oversee escalation governance, triage efficiency, documentation strategy, telemetry-driven advocacy, process automation, and GenAI adoption to reduce MTTR and systemic platform friction. Serve as cross-functional translator for stakeholders and drive data-backed improvements to product and infrastructure.

Top Skills: Browser DevtoolsCi/CdClaudeConfluenceCurlDatadogGeminiGithub CopilotInfrastructure-As-CodeJSONObservePostmanSQL

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
Key Industries: Artificial intelligence, adtech, media, software, game development
Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering