Senior Machine Learning Data Engineer
We are looking for a Senior Machine Learning Data Engineer who is passionate about all things data. You will be working on enhancing top-flight datasets and innovative data products. Data quality and best practices are at the core of our team ethos as we support a fast-moving, highly cross-functional organization.
TECHNOLOGY:
- AWS, GCP, and On-prem Ecosystem
- Python, Spark, Snowflake, Big Query, Postgres, Hive, HDFS, Parquet
- Kubernetes, Docker
- Pytorch, ONNX
- Airflow
RESPONSIBILITIES:
Lead
- Break down product initiative requirements, identify dependencies and create implementation plans
- Mentor individuals through detailed feedback during code reviews
Design
- Design and scale petabyte-scale data flows
- Participate in design reviews and production reviews for new features, products, or pieces of infrastructure
Implement
- Automate manual tasks from data science and create tools for data scientists to simplify future automation
- Build and enhance current data warehousing architecture to provide insights and analytics to our internal and external clients
- Develop and release via CI/CD and agile methodologies
- Automate and maintain infrastructure builds in AWS/On-Prem/GCP to support applications running in Kubernetes (Terraform, Ansible, Chef)
- Build shared components and/or frameworks that improve engineering productivity across the organization
- Create and maintain documentation of services, tools, and frameworks
- Play a key role in building the ETL/ELT stack to cleanse, transform and load data from different sources using multiple technologies
- Ensure that data is easily discoverable and usable for data scientists and analysts across the company
- Identify root causes of instability in a large-scale distributed system, across stacks
QUALIFICATIONS:
2+ years production experience with:
- a scripting language like Python, or JavaScript
- writing SQL statements
- building and optimizing workflows that cross, columnar stores, row-level stores, and transactional databases. (OLTP databases such as Postgresql, MySQL; OLAP databases such as Snowflake, BigQuery, Redshift; and NoSQL databases such as DynamoDB, MongoDB, Couchbase)
- developing data pipelines for at least terabyte volumes of data
- modeling, measuring, and analyzing complex data
- server-side concepts such as microservices, databases, caching, monitoring, and scalability
- schema design and dimensional data modeling
- distributed data technologies for building efficient & large-scale data pipelines like Hadoop, MapReduce, Spark, Flink, Kafka
Qualifications (Prefered but NOT required):
- deploying models, UDFs, or other custom algorithms into a batch process workflow
- working with workflow scheduling technologies, such as Airflow or Databricks Jobs
- data orchestration tools such as Airflow or dbt
- handling PII data in compliance environments like CCPA or GDPR
- container orchestration systems such as Kubernetes, Rancher, Nomad, ECS
- cloud-based solutions such as Amazon AWS, Google Cloud, or Microsoft Azure
- Experience in statistics, data mining, or machine learning
- BS/BA in Technical Field, Computer Science or Mathematics
- Experience specifically with Pyspark and Scala working with large data sets
PERKS:
- Unlimited paid time off each year
- Company-sponsored health, dental and vision benefits for you and your dependents
- Childcare stipends
- Pet Funds + Pet Care Savings
- Work from home set up provided
- 401k Plan
- A progressive approach to paid parental leave
- 1:1 Nutrition/Food program options
- Partnership with DoorDash for meal deliveries
- Wellness (Financial Wellness, Modern Health)
- Employee Advisory Groups / Proactive Social Groups
- Short Term & Long Term Disability Insurance
- Referral Bonus
- Epic personal and professional growth opportunities
- Access to state-of-the-art fitness classes and personal trainers to promote your well-being. Live and recorded virtual workout classes every day + weekly yoga/pilates/wellness opportunities
- Quarterly fitness challenges
ABOUT
The proliferation of streaming and digital media services has created the golden age of video entertainment with more premium TV content than ever before and we believe these experiences should be free or affordable to every human on the planet.
Making this accessible to everyone usually requires subsidies from the advertising industry to power an effective three-way value exchange between a publisher producing content, a consumer viewing it and an advertiser paying the publisher for the chance to connect with its audience.
While the ability to watch premium TV content on any device at any time is great for the consumer, the systems powering these services are fragmented, riddled with complexity and struggling to evolve in a changing privacy landscape. These systems are putting the entire ecosystem and three way value exchange at risk by providing an incomplete and siloed view of the audience that are responsible for advertising waste from inflated metrics, a decrease in publisher revenue opportunity from under-representing their audiences and consumers stuck with suboptimal viewing experiences.
Today the industry standard tools are usually designed where connectivity and applications of consumer data sets often come with negative trade off for consumer privacy or security risks of leaking private data. We envision a world where this doesn't have to be the case - a world where consumer privacy, security, and governance are incorporated into the fabric of the codebase that interfaces with these systems to enable the necessary business use-cases that keep these viewing experiences free or affordable to everyone.
The mission of VideoAmp is to be the independent software and data company creating a more sophisticated data-driven advertising ecosystem that redefines how media is valued, bought and sold. Our platform provides measurement and optimization tools that unifies audiences across the disparate systems of traditional TV, streaming video and digital media. We are unlocking new value for those that currently operate within a siloed view of their audiences, creating efficiencies for the entire industry. We are transforming a 100-year old industry by powering a more effective three-way value exchange that results in advertisers increasing their return on investment, publishers increasing their revenues and improving the viewing experience for consumers.
Come and Join Us!
#LI-Remote