Senior Data Engineer
Job Summary
Blackline's new Cloud Engineering team is seeking an experienced Data Engineer with expertise in platforms with asynchronous, event driven architecture to help advance the platform to allow the organization to make data-driven decisions and also improve the product experience. In this role, you will lead architecture, design, build, and manage physical data structures designed for flexibility, scalability and resiliency to support current and future business needs of all the BlackLine products and initiatives in automated, repeatable way. You'll design, build, and maintain processes and components of a data pipeline to support analytics, focusing on data quality and governance, pipeline performance, and best practices for democratized data access. You will also mentor and train new team members on design and development and make recommendations to improve our model.
The team is also responsible for building and operating tools and platform around data while also promoting best practices around it with a focus on user data privacy. If you're an expert in automation and tools development, Containers, Event driven technology (Kafka), traditional data warehousing, ETL and/or big data pipeline and processing, you're exactly who we're seeking. Technical capabilities aside, if you're a self-starter who's comfortable with ambiguity, able to think big without overlooking minute details, and who thrives in a fast-paced environment, you're perfect for our new Cloud Engineering team.
Roles and Responsibility (list in order of importance)
- Ensure 99.99%+ availability of the services and infrastructure that spans across multiple global datacenters in private and public clouds.
- Design, build and maintain production data-pipelines (both batch and streaming) that deliver over a billion events with measurable quality within bounds of defined SLA(s)
- Work with product and engineering managers to build necessary tools to help with their data-related needs
- Build tools and infrastructure that enables external teams (engineers and analysts) to effectively create efficient data-pipelines
- Be a champion of the overall strategy for data governance, security, privacy, quality and retention that will satisfy business policies and requirements
- Monitor and maintain health, performance, and security of all infrastructure components.
- Desire to automate everything
- Architect and create reference architecture for Kafka Implementation standards
- Participate in the design and development of ETL and ELT processes for data integration using Google Cloud Platform applications (BigQuery)
- Design, build, and maintain processes and components of a streaming data/ETL pipeline to support real-time analytics (from requirements to data transformation, data modeling, metric definition, reporting, etc.)
- Design, create and manage physical data structures designed for flexibility, scalability and resiliency to support future business needs.
- Build and maintain ETL jobs for centralizing data into our Google Cloud data lake
- Build and maintain tools to automate data modeling and data quality checks
- Put in place tooling and frameworks to facilitate data governance and shared analysis between internal organizations
Required Qualifications
Years of Experience in Related Field: Minimum 5 Years
Education: Masters Preferred
Technical/Specialized Knowledge, Skills, and Abilities:
- We run in Google Cloud and rely heavily on BigQuery, Cloud Storage and our internal ETL frameworks to automate tasks
- Great communication and interpersonal skills
- Individuals who are motivated by enabling and helping others within the company be data driven
- Experience with Windows, Linux system administration, and/or Enterprise applications
- Experience with container orchestration technologies (kubernetes, mesos, swarm etc.) and deployment methodologies and technologies (CI/CD, Chef, Puppet, Ansible, etc.)
- Create reference architecture for Kafka Implementation standards
- Standing up and administer on Kafka cluster, provide expertise in Kafka brokers, zookeepers, Kafka connect, schema registry, KSQL, Rest proxy, Replicator, ADB, Operator and Kafka Control center.
- Create topics, setup redundancy cluster, deploy monitoring tools, alerts and has good knowledge of best practices.
- Provide administration and operations of the Kafka platform like provisioning, access lists Kerberos and SSL configurations.
- Use automation tools like provisioning using Docker, Jenkins and GitLab.
- Experience with Relational Databases, NoSQL Databases and/or Big Data technologies
- Working knowledge in IP and storage networking including SDN, Linux, application networking, DNS, SAN and hybrid technologies
- Proficient with Terraform
- Proficient in a modern scripting language (preferably Python) for automation of build tasks.
- Experience with Big Data on GCP - BigQuery, Pub/Sub, Dataproc, Dataflow. (Nice to have)
- Networking principles and protocols such as IP subnetting, routing, firewall rules, Virtual Private Cloud, LoadBalancer, Cloud DNS, Cloud CDN, etc.
- Prior experience working with: container technology such as Docker, version control systems (Github), build management and CI/CD tools (Concourse, Jenkins), and monitoring tools (App Dynamics, etc.)
Preferred Qualifications
- Demonstrable Cloud service provider experience (ideally GCP) - infrastructure build and configurations of a variety of services including Compute, Storage, SDN (VPC and XPN)
- Experience of working with Continuous Integration (CI), Continuous Delivery (CD) and continuous testing tools
- Experience working within an Agile environment
- Programming experience in one or more of the following languages: Python, Ruby, Java, JavaScript, Go
- Automation scripting (using scripting languages such as Terraform, Ansible etc.)
- Server administration (either Linux or Windows)
- Ability to quickly acquire new skills and tools