Vast.ai

Data Engineer — Analytics Infrastructure (Foundational Hire)

Reposted 13 Days Ago

Easy Apply

In-Office

Los Angeles, CA

140K-190K Annually

Mid level

Easy Apply

In-Office

Los Angeles, CA

140K-190K Annually

Mid level

The Data Engineer will build and maintain the data pipeline, manage data quality, and enable self-serve analytics for various departments. Responsibilities include designing schemas, ingesting data, and collaborating with stakeholders to create and scale the data platform.

The summary above was generated by AI

About Us

Vision: To make life substrate independent through Vast Artificial Intelligence

Mission: To organize, optimize, and orient the world's computation

Vast.ai’s cloud powers AI projects and businesses all over the world. We are democratizing and decentralizing AI computing—reshaping our future for the benefit of humanity.

We are a growing and highly motivated team dedicated to an ambitious technical plan. Our structure is flat, our ambitions are out‑sized, and leadership is earned by shipping excellence.

We seek a data engineer with strong intrinsic drive, a true passion for uncovering insights from data, and a mix of analytical, programming, and communication skills.

LOCATION: On‑site at our office in Westwood, Los Angeles
TYPE: Full‑time • On‑site • Immediate start preferred
REPORTS TO: Operations (partnering closely with Engineering)

About the Role

This is a foundational role: you’ll own the 0→1 build of our data platform—ingestion, modeling, governance, and self‑serve analytics in QuickSight—for Marketing, Sales, Accounting, and leadership. We’re hiring a Data Engineer to build and own the end‑to‑end data platform at Vast.ai.

This is a hands‑on role for a builder who can move fast: designing schemas, implementing ELT/ETL, hardening data quality, and enabling secure, governed access to data across the company.

Full-time
On-site at our LA office

What You’ll Do

Own the data pipeline: design, build, and operate batch/streaming ingestion from product, billing, CRM, support, and marketing/ad platforms into a central warehouse.
Model the data: create clean, well‑documented staging and business marts (dimensional/star schemas) that map to the needs of Marketing, Sales, Accounting/Finance, and Operations.
Enable: publish certified datasets with row‑/column‑level security, manage refresh SLAs, and make it easy for teams to self‑serve.
Collaborate cross‑functionally: intake requirements, translate them into data contracts and models, and partner with Engineering on event/telemetry capture.
Document & scale: maintain clear docs, lineage, and a pragmatic data catalog so others can discover and trust the data.

Tech Stack

Our current environment includes PostgreSQL, Python, SQL, and QuickSight. You’ll lead the next step‑function in maturity using a pragmatic, AWS‑centric stack such as:

AWS: S3, Glue/Athena or Redshift, Lambda/Step Functions, IAM/KMS
Orchestration & Modeling: Airflow or Dagster; dbt (or equivalent SQL modeling)
Data Quality & Observability: built‑in checks or tools like Great Expectations
Source Connectivity: APIs/webhooks; optionally Airbyte/Fivetran for managed connectors
Versioning/Infra: Git/GitHub Actions; Terraform (nice to have)
Marketing attribution: Segment io, Posthog, others

(We’re flexible on exact tools—strong fundamentals matter most.)

Qualifications

Must‑have

3+ years (typically 3–6) in a Data Engineering role building production ELT/ETL on a cloud platform (AWS strongly preferred).
Expert SQL and solid Python for data processing/automation.
Proven experience designing data models (staging, marts, star schemas) and standing up a warehouse/lakehouse.
Orchestration, scheduling, and operational ownership (SLAs, alerting, runbooks).
Experience enabling a BI layer (ideally QuickSight) with secure, governed datasets.
Strong collaboration and communication; able to gather requirements from non‑technical stakeholders and translate to data contracts.

Nice‑to‑have

Marketing/Sales/RevOps data (CRM, ads, attribution), Accounting/Finance integrations, or product telemetry/event pipelines.
Stream processing (Kafka/Kinesis), CDC, or near‑real‑time ingestion.
Data privacy/security best practices (e.g., CPRA), partitioning/performance tuning, and cost management on AWS.

90‑Day Outcomes

Inventory & architecture: clear map of sources, proposed target architecture, and a prioritized backlog aligned with Ops/Engineering.
First pipelines live: automated ingestion + core staging tables with data quality checks and alerts.
Business marts: at least two curated domains live (e.g., Marketing & Sales) powering certified QuickSight datasets for stakeholders.
Runbook & docs: onboarding‑ready documentation, lineage, and incident playbooks.

Interview Process (≈ 1 week)

15 min — Initial screening (virtual)
45 min — Architecture deep‑dive into our data environment and target platform (virtual)
2 hours — On‑site practical: build/modify a small ETL + modeling exercise; discuss trade‑offs, quality, and ops

Annual Salary Range

$140,000 – $190,000 + equity + benefits

Benefits

Comprehensive health, dental, vision, and life insurance
401(k) with company match
Meaningful early-stage equity
Onsite meals, snacks, and close collaboration with founders/tech leaders
Ambitious, fast-paced startup culture where initiative is rewarded

Top Skills

Airflow

APIs

Athena

AWS

Dagster

Dbt

Git

Github Actions

Glue

Great Expectations

Iam

Kms

Lambda

Postgres

Posthog

Python

Quicksight

Redshift

Segment

SQL

Step Functions

Terraform

Webhooks

Similar Jobs

CoreWeave

Campaign Manager

42 Minutes Ago

In-Office

135K-198K Annually

Senior level

135K-198K Annually

Senior level

Cloud • Information Technology • Machine Learning

As Campaign Manager, you will oversee the development and execution of integrated marketing campaigns, ensuring alignment across teams and tracking performance to optimize future efforts.

CoreWeave

Site Selection Analyst

42 Minutes Ago

In-Office

99K-140K Annually

Mid level

99K-140K Annually

Mid level

Cloud • Information Technology • Machine Learning

As a Site Selection Analyst, you will manage site selection documentation, track progress in Salesforce, coordinate with teams, and conduct research related to site evaluations and market assessments.

Top Skills: ExcelGoogle WorkspaceSalesforceSmartsheet

CoreWeave

Reliability Engineer

42 Minutes Ago

In-Office

139K-204K Annually

Senior level

139K-204K Annually

Senior level

Cloud • Information Technology • Machine Learning

As a Storage Reliability Engineer, you will manage mission-critical storage systems, troubleshoot complex incidents, and improve infrastructure reliability through automation and tooling.

Top Skills: Csi DriversGoKubernetesNfsS3

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
Key Industries: Artificial intelligence, adtech, media, software, game development
Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering