Principal Data Processing Engineer - OSS
Mountain View, CA
About DataPelago:
DataPelago is at the forefront of revolutionizing data processing for traditional analytics and cutting-edge GenAI preprocessing. We are building an innovative data processing engine that is transforming how Apache Spark, Apache Flink, Ray, and others operate on diverse, large-scale data. Our team of engineers drive and adopt advances in hardware-accelerated computing, parallel processing of large-scale data, query optimization, distributed systems, compilers, machine learning, and cloud-native computing. We are looking for world-class engineers to join our team and shape the future of accelerated data processing.
The Role:
As a Principal Data Processing Engineer (OSS), you will be a key individual contributor in
adopting and advancing the capabilities of open-source software (OSS) platforms such as Apache
Gluten, Velox, Apache Spark, and Apache Flink in the context of DataPelago’s data processing engine. You will enhance the functional breadth, performance, scale, and reliability of the DataPelago engine through downstream and upstream contributions. You will have the opportunity to engage with the community working on these platforms. This is a unique opportunity to make a significant impact on a category-defining product and work with a talented team of engineers.
What You'll Do:
- Influence the architecture of how our data processing engine interfaces with open-source platforms and engines.
- Lead the design of functional and performance enhancements to open source platforms such as Apache Gluten and Velox, and their integration with our data processing engine.
- Individually design, implement, test, optimize, and maintain components of the data processing engine.
- Analyze the technology roadmap of Apache Gluten, Velox, and equivalent platforms and identify opportunities for our engine to enhance technology and product leadership.
- Collaboration: Partner with engineering, product management, the open-source community and customer success teams.
- Foster best practices in design and code reviews, testing, CI/CD, and issue resolution to maintain the highest product quality, security, efficiency, and productivity.
What You'll Bring:
- BS/MS in Computer Science (or a related field) with 6+ years of relevant experience
- 3+ years of deep technical experience in instrumenting, analyzing, and optimizing the performance of data processing engine components on benchmark and customer workloads.
- Sound knowledge of the architecture and internal operation of one or more of Apache Spark,
Apache Flink, Presto/Trino. - Demonstrated experience in the design, development, and successful release of high-performance data processing engines for large production deployments.
- Exceptional programming skills in C, C++, and Java.
- Extensive development experience in Linux environments.
- Excellent communication and collaboration skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences.
- Strong analytical and problem-solving skills with a passion for performance optimization.
Location Considerations:
We value face-to-face collaboration, but recognize that talent can be found anywhere. Our engineering team works at our headquarters in Mountain View, CA, at our India office in Hyderabad, and at remote locations.
Why Join DataPelago?
- Technical Leadership: Take a leadership role in shaping the architecture and development of how our core engine works with open source data processing platforms
- Cutting-Edge Innovation: Work on challenging problems at the forefront of accelerated
computing and data processing. - Significant Impact: Your contributions will directly impact the performance and scalability of our mission-critical platform.
- Mentorship and Growth: Mentor and guide other talented engineers while expanding your own technical expertise.
- Competitive compensation, stock options, comprehensive benefits package, and leadership development opportunities
Top Skills
Similar Jobs
What you need to know about the Los Angeles Tech Scene
Key Facts About Los Angeles Tech
- Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
- Key Industries: Artificial intelligence, adtech, media, software, game development
- Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
- Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering



.jpg)