- Build infrastructure and automation for the extraction, preparation, and loading of data from various sources
- Create unit and stress test components to monitor technical performance and ensure identified issues are resolved
- Build and maintain analytical tools to provide data insight and capture key metrics
- Automate and integrate new components into the data pipeline.
- Utilize best practices for data governance, data quality, data cleansing, and other ETL related activities.
- Maintain technical documentation
- 2+ years development experience in data engineering
- 1+ years professional experience working in big data ecosystems, such as Spark, Kafka, and Hadoop
- 1+ years professional experience working with dataflow management tools, such as Pentaho, Amazon Glue, and Apache NiFi
- Hands-on scripting experience with Python, Scala and shell scripting
- Preference for development experience in highly-scalable, distributed systems and cluster architectures (e.g. AWS, Azure, Google Cloud, etc)
- Familiarity with complex NoSQL databases (e.g. DynamoDB, Cassandra, Elasticsearch, etc)
- Prior experience working with large data sets (>1M+ records)
- B.S. preferred in Computer Science, Information Systems, or related field (foreign education equivalent accepted)