MLabs Logo

MLabs

DevOps / Infrastructure Engineer

Posted 3 Hours Ago
Remote
Hiring Remotely in United States
100K-130K Annually
Senior level
Remote
Hiring Remotely in United States
100K-130K Annually
Senior level
Design, deploy, and operate a secure, highly available distributed containerized fleet of single-tenant AI trading agents across a hybrid Railway and AWS environment. Build zero-touch IaC and CI/CD provisioning, manage Tailscale-based private networks, implement monitoring/alerting and automated incident response, preserve in-flight transaction state, and participate in on-call incident management for production financial systems.
The summary above was generated by AI

Location: Remote - EST timezone

Remote | Full-time

Compensation: $100K - $130K

We are hiring on behalf of our client who is seeking an exceptional, production-proven Infrastructure & DevOps Engineer to take absolute ownership of the deployment, secure networking, architectural lifecycle, and overall reliability of this distributed agent fleet from day one. The client is engineering a sophisticated infrastructure designed to launch a highly distributed fleet of managed, single-tenant personal artificial intelligence (AI) trading agents. Operating non-stop, these isolated processes execute high-frequency, complex financial workflows natively on blockchain infrastructure, dedicated exclusively to individual user portfolios.

Key Responsibilities

  • Fleet Orchestration & Scaling: Architect, provision, and scale the core user agent fleet across a hybrid Railway and AWS ecosystem, ensuring each user retains an isolated, secure, and predictable containerized process with optimized cost tracking and precise lifecycle hooks.
  • Secure Network Engineering: Establish, manage, and continuously harden private overlay networks using Tailscale in production, linking disparate user agents securely with core Model Context Protocol (MCP) servers and the underlying live trading runtimes.
  • Automated User Provisioning: Design and construct an end-to-end, zero-touch deployment pipeline utilizing advanced infrastructure-as-code and CI/CD best practices, enabling seamless, single-click automated provisioning of containers, secrets management, and environmental configurations for new users.
  • Operational Resilience & SRE: Define, build, and maintain comprehensive monitoring, telemetry, alerting, and automated incident response frameworks to guarantee graceful state retention, preserving live in-flight transaction states across sudden host restarts, scheduled key rotations, or regional cloud outages.
  • Incident Management: Oversee system health and participate in direct real-incident response and on-call rotations to maintain strict operational continuity for the live global fleet.

Requirements
  • Container PaaS Orchestration: Proven professional experience deploying, monitoring, and scaling complex architectures in production utilizing Railway, or equivalent containerized platform-as-a-service frameworks (such as Fly.io, Render, or Northflank).
  • Advanced AWS Proficiency: In-depth technical mastery of Amazon Web Services (AWS), with practical expertise spanning Virtual Private Clouds (VPC), Identity & Access Management (IAM), Secrets Manager, and elastic scaling frameworks (ECS / AWS Lambda).
  • Production-Grade Tailscale Networking: Demonstrated experience implementing Tailscale within a high-security production environment, with distinct competence configuring Access Control Lists (ACLs), complex subnet routing, and ephemeral node lifecycles.
  • Modern Infrastructure & CI/CD: Mastery of Docker containerization, comprehensive CI/CD deployment pipelines, and modern Infrastructure-as-Code (IaC) paradigms.
  • Blockchain & Onchain Context: Technical familiarity with blockchain mechanics, smart contract interactions, or web3 infrastructure paradigms to support decentralized application layers.
  • High-Availability / Financial SRE Background: A proven professional history managing environments where system stability impacts critical financial outcomes, paired with total comfort managing on-call duties and live incident response.

Nice to Have

  • Direct experience deploying, managing, and monitoring Large Language Model (LLM) or autonomous AI agent fleets at multi-tenant scale.
  • Prior exposure to quantitative trading systems, high-frequency execution runtimes, or deep integrations with platforms such as Hyperliquid.

Benefits
  • Highly competitive compensation package
  • The flexibility of a fully remote operating environment with an immediate start timeline.
  • The opportunity to shape the architectural foundation of a cutting-edge technical ecosystem intersecting Artificial Intelligence and decentralized financial infrastructure.
  • Access to top-tier modern tooling, modern infrastructure frameworks, and a highly streamlined, zero-red-tape development culture.

Due to the high volume of applications we anticipate, we regret that we are unable to provide individual feedback to all candidates. If you do not hear back from us within 4 weeks of your application, please assume that you have not been successful on this occasion. We genuinely appreciate your interest and wish you the best in your job search.

Commitment to Equality and Accessibility:

At MLabs, we are committed to offer equal opportunities to all candidates. We ensure no discrimination, accessible job adverts, and providing information in accessible formats. Our goal is to foster a diverse, inclusive workplace with equal opportunities for all. If you need any reasonable adjustments during any part of the hiring process or you would like to see the job-advert in an accessible format please let us know at the earliest opportunity by emailing [email protected].

MLabs Ltd collects and processes the personal information you provide such as your contact details, work history, resume, and other relevant data for recruitment purposes only. This information is managed securely in accordance with MLabs Ltd’s Privacy Policy and Information Security Policy, and in compliance with applicable data protection laws. Your data may be shared only with clients and trusted partners where necessary for recruitment purposes. You may request the deletion of your data or withdraw your consent at any time by contacting [email protected].

Similar Jobs

4 Days Ago
Remote
USA
Senior level
Senior level
AdTech • Artificial Intelligence • Marketing Tech
As a Senior DevOps Engineer, you will manage and optimize multi-cloud infrastructure, drive Infrastructure as Code, improve CI/CD processes, and enhance system reliability and performance.
Top Skills: AWSBigQueryCloudflareEksElkGCPHetznerKubernetesOpensearchRdsS3SqsTerraform
11 Days Ago
Remote
USA
125K-200K Annually
Senior level
125K-200K Annually
Senior level
Healthtech • Information Technology • Software
As a DevOps/Infrastructure Engineer, you will design and maintain GCP infrastructure, optimize production systems, implement CI/CD, and ensure HIPAA compliance.
Top Skills: Cloud FunctionsCloud RunCloud SqlCloud TasksFastlaneGcp Ops SuiteGcp Secret ManagerGithub ActionsGoogle Cloud PlatformSentryTerraform
15 Days Ago
Remote
United States
Senior level
Senior level
Gaming
The DevOps Engineer will design, build, and optimize ML and data infrastructure on GCP, mentor a team, and ensure system reliability and performance.
Top Skills: AnsibleBigQueryBigtableCloud RunComposerDatadogDataflowDockerGoogle Cloud Platform (Gcp)GroovyJenkinsKubernetesPub/SubPythonShellTerraformVertex Ai

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account