Allata

Data Architect (Databricks)

Posted 7 Days Ago

Be an Early Applicant

Hybrid

Dallas, TX

Senior level

Hybrid

Dallas, TX

Senior level

The Databricks Architect will lead the implementation of data platforms, design architecture, ensure data governance, and optimize data pipelines while collaborating with stakeholders.

The summary above was generated by AI

Allata is a global consulting and technology services firm with offices in the US, India, and Argentina. We help organizations accelerate growth, drive innovation, and solve complex challenges by combining strategy, design, and advanced technology. Our expertise covers defining business vision, optimizing processes, and creating engaging digital experiences. We architect and modernize secure, scalable solutions using cloud platforms and top engineering practices.

Allata also empowers clients to unlock data value through analytics and visualization and leverages artificial intelligence to automate processes and enhance decision-making. Our agile, cross-functional teams work closely with clients, either integrating with their teams or providing independent guidance—to deliver measurable results and build lasting partnerships.

We are looking for a Databricks Architect with strong experience in enterprise data platform architecture and governance to lead client-facing data platform implementations. This role blends high-impact architectural responsibilities (reference architectures, security, scalability, cost management, operational model) with technical leadership in designing, building, deploying, and optimizing data pipelines and data products on Lakehouse/EDW platforms (with an emphasis on Databricks). You will own the full lifecycle of data products and standardize patterns, tools, and practices across large-scale, vertical-specific implementations for our clients.

Role & Responsibilities:

Define the overall data platform architecture (Lakehouse/EDW), including reference patterns (Medallion, Lambda, Kappa), technology selection, and integration blueprint.
Design conceptual, logical, and physical data models to support multi-tenant and vertical-specific data products; standardize logical layers (ingest/raw, staged/curated, serving).
Establish data governance, metadata, cataloging (e.g., Unity Catalog), lineage, data contracts, and classification practices to support analytics and ML use cases.
Define security and compliance controls: access management (RBAC/IAM), data masking, encryption (in transit/at rest), network segmentation, and audit policies.
Architect scalability, high availability, disaster recovery (RPO/RTO), and capacity & cost management strategies for cloud and hybrid deployments.
Lead selection and integration of platform components (Databricks, Delta Lake, Delta Live Tables, Fivetran, Azure Data Factory / Data Fabric, orchestration, monitoring/observability).
Design and enforce CI/CD patterns for data artifacts (notebooks, packages, infra-as-code), including testing, automated deployments and rollback strategies.
Define ingestion patterns (batch & streaming), file compacting/compaction strategies, partitioning schemes, and storage layout to optimize IO and costs.
Specify observability practices: metrics, SLAs, health dashboards, structured logging, tracing, and alerting for pipelines and jobs.
Act as technical authority and mentor for Data Engineering teams; perform architecture and code reviews for critical components.
Collaborate with stakeholders (Data Product Owners, Security, Infrastructure, BI, ML) to translate business requirements into technical solutions and roadmap.
Design, develop, test, and deploy processing modules using Spark (PySpark/Scala), Spark SQL, and database stored procedures where applicable.
Build and optimize data pipelines on Databricks and complementary engines (SQL Server, Azure SQL, AWS RDS/Aurora, PostgreSQL, Oracle).
Implement DevOps practices: infra-as-code, CI/CD pipelines (ingestion, transformation, tests, deployment), automated testing and version control.
Troubleshoot and resolve complex data quality, performance, and availability issues; recommend and implement continuous improvements.

Hard Skills - Must have:

Previous experience as architect or lead technical role on enterprise data platforms.
Hands-on experience with Databricks technologies (Delta Lake, Unity Catalog, Delta Live Tables, Auto Loader, Structured Streaming).
Strong expertise in Spark (PySpark and/or Scala), Spark SQL and distributed job optimization.
Solid background in data warehouse and lakehouse design; practical familiarity with Medallion/Lambda/Kappa patterns.
Experience integrating SaaS/ETL/connectors (e.g., Fivetran), orchestration platforms (Airflow, Azure Data Factory, Data Fabric) and ELT/ETL tooling.
Experience with relational and hybrid databases: MS SQL Server, PostgreSQL, Oracle, Azure SQL, AWS RDS/Aurora or equivalents.
Proficiency in CI/CD for data pipelines (Azure DevOps, GitHub Actions, Jenkins, or similar) and packaging/deployment of artifacts (.whl, containers).
Experience with batch and streaming processing, file compaction, partitioning strategies and storage tuning.
Good understanding of cloud security, IAM/RBAC, encryption, VPC/VNet concepts, and cloud networking.
Familiarity with observability and monitoring tools (Prometheus, Grafana, Datadog, native cloud monitoring, or equivalent).

Hard Skills - Nice to have/It's a plus:

Automation experience with CICD pipelines to support deployment and integration workflows including trunk-based development using automation services such as Azure DevOps, Jenkins, Octopus.
Advanced proficiency in Pyspark for advanced data processing tasks.
Advance proficiency in spark workflow optimization and orchestration using tools such as Asset Bundles or DAG (Directed Acyclic Graph) orchestration.
Certifications: Databricks Certified Data Engineer / Databricks Certified Professional Architect, cloud architect/data certifications (AWS/Azure/GCP).

Soft Skills / Business Specific Skills:

Ability to identify, troubleshoot, and resolve complex data issues effectively.
Strong teamwork, communication skills and intellectual curiosity to work collaboratively and effectively with cross-functional teams.
Commitment to delivering high-quality, accurate, and reliable data products solutions.
Willingness to embrace new tools, technologies, and methodologies.
Innovative thinker with a proactive approach to overcoming challenges.

At Allata, we value differences.

Allata is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Allata makes employment decisions without regard to race, color, creed, religion, age, ancestry, national origin, veteran status, sex, sexual orientation, gender, gender identity, gender expression, marital status, disability or any other legally protected category.

This policy applies to all terms and conditions of employment, including but not limited to, recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.

Top Skills

Auto Loader

Aws Rds

Azure Data Factory

Azure Devops

Databricks

Delta Lake

Delta Live Tables

Github Actions

Jenkins

Oracle

Postgres

Pyspark

Scala

Spark

Spark Sql

SQL Server

Structured Streaming

Unity Catalog

Similar Jobs

Chewy

Operations Manager

33 Minutes Ago

Hybrid

Houston, TX, USA

Senior level

eCommerce • Healthtech • Pet • Retail • Pharmaceutical

The Operations Manager will oversee inbound/outbound activities, drive improvements, mentor teams, and ensure performance metrics are met while fostering a safe work environment.

Top Skills: Warehouse Management System (Wms)

Striveworks

Senior Product Manager

34 Minutes Ago

Easy Apply

Hybrid

Austin, TX, USA

Easy Apply

155K-200K Annually

Senior level

155K-200K Annually

Senior level

Artificial Intelligence • Big Data • Computer Vision • Information Technology • Machine Learning • Analytics • Defense

As a Senior Product Manager for the AI Platform, you will define product vision, manage requirements, support execution, and drive go-to-market strategies while working closely with engineering and sales teams.

Top Skills: AICloud InfrastructureFigmaMachine Learning

Striveworks

Machine Learning Engineer

34 Minutes Ago

Easy Apply

Remote or Hybrid

Easy Apply

200K-250K Annually

Expert/Leader

200K-250K Annually

Expert/Leader

Artificial Intelligence • Big Data • Computer Vision • Information Technology • Machine Learning • Analytics • Defense

The Staff Machine Learning Engineer will lead the development of AI solutions, ensure project delivery, and manage technical teams while interfacing with customers. Responsibilities include designing data pipelines, overseeing machine learning model development, and facilitating customer engagement.

Top Skills: C++GoJavaKafkaKubernetesPythonPyTorchRabbitMQRustScalaScikit-LearnTensorFlow

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
Key Industries: Artificial intelligence, adtech, media, software, game development
Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering