Data Engineer at Tradesy
At Tradesy, we care deeply about the environmental issues caused by low quality disposable products that just end up polluting our environment. We are a peer-to-peer marketplace that is pioneering a sustainable future for commerce in the fashion industry. We are financially backed by top tier venture capital firms, with millions of passionate members and a product that people love. Our mission is to make the resale of consumer goods as simple, safe, and stylish as retail, at scale and we do this in an office with an ocean view in sunny Santa Monica, California!
Tradesy is seeking an exceptionally talented Data Engineer who works closely with other engineers, data scientists, architects and product owners to develop reliable, efficient, and scalable systems and data pipelines with a strong focus on engineering quality. The ideal candidate has a passion for building things “right”, creating leverage for the business by designing and implementing systems that are low maintenance and withstand the test of time.
We are also migrating our applications, data warehouse and data pipelines to Google Cloud to take advantage of Google’s cutting edge technologies and managed services. We’re already leveraging the power of PubSub, BigQuery and Google’s Data ecosystem in very interesting ways. If this sounds exciting to you and you believe you can impact our business with your passion and skills, then you are the right candidate for us.
- Interact and integrate with internal and external teams and systems to extract, transform, and load data from a wide variety of sources
- Support our BI, Data Science and Engineering teams to assist with data-related technical issues and support their data and data infrastructure needs
- Migrate data pipelines from AWS to GCP
- Respond to alerts and investigate data issues
- Participate in architectural decisions
- Computer Science degree or equivalent experience.
- 2+ years of experience with data pipeline development (ETL) and data cleansing with large-scale, complex data sets
- 2+ years of experience with batch processing technologies (ie. Hadoop, Spark, Apache Beam, Flink)
- Coding proficiency in at least one of Java, Python, C++, Go, Scala
- Strong computer science skills with a focus on algorithms and data structures
- A master of data querying for relational and/or NoSQL data stores
- Experience with multiple data formats: CSV, JSON, Avro, Parquet
- Experience with stream processing technologies such as Apache Beam, Spark Streaming, Flink, Kafka Streams
- ETL experience on AWS using EMR, Firehose, Lambda
- ETL experience on Google Cloud using Composer, Dataproc, Cloud Functions, Dataflow
- Redshift or BigQuery experience
- Familiar with messaging systems such as Kinesis, Kafka, PubSub
- Familiar with automation tools such as Apache Airflow, Luigi, AWS Data Pipeline
- Competitive base salary plus meaningful equity
- Comprehensive benefits (Medical, Dental, Vision, 401k)
- Flexible Paid Time Off
- Daily catered lunches
- Dog friendly office
- Collaborative, fun team