About The Role
We are looking for a Senior Software Engineer with a passion for data systems to join our engineering team. You'll spend significant time architecting and building the core data infrastructure that powers our AI-driven platform. This role combines hands-on software development with deep involvement in data engineering initiatives, from designing scalable pipelines to implementing intelligent data processing systems that handle billions of documents and structured datasets.
As a member of our engineering team, you'll build robust, scalable systems that ingest, process, and serve data to our AI agents and end users. This is an opportunity to shape the technical foundation of a platform that's transforming how R&D teams discover and leverage global innovation data.
Responsibilities
- Design and build scalable data processing systems that handle ingestion and transformation of structured and unstructured data (APIs, SQL schemas, raw documents) into queryable, AI-ready formats
- Develop production-grade software with a focus on data-intensive applications, implementing robust error handling, monitoring, and performance optimization
- Architect and implement data pipelines using Apache Airflow and Python, ensuring reliability, scalability, and maintainability across distributed systems
- Build and optimize data storage solutions using both relational and document-based storage systems, with a focus on query performance and cost efficiency
- Create data access layers and APIs that enable seamless integration between data infrastructure and application services
- Own end-to-end feature development from data ingestion through to user-facing functionality, ensuring data quality and system reliability at every step
- Implement monitoring and observability for data pipelines and services, proactively identifying and resolving data quality issues
- Work directly with the CTO on establishing data engineering best practices, maintain comprehensive documentation, and mentor team members on data-related initiatives.
- Drive technical decisions around data architecture, storage mechanisms, and processing frameworks to support both current needs and future scale
Requirements
- 5-8 years of software engineering experience with significant exposure to data-intensive applications and distributed systems
- Strong programming expertise in Python with experience building production applications and data processing systems
- Hands-on experience with Google Cloud Platform (GCP) services, particularly for data engineering (BigQuery, Cloud Storage, Dataflow, Pub/Sub)
- Proficiency with Apache Airflow / Google Cloud Composer for building and orchestrating complex data workflows
- Solid understanding of data modeling principles for both SQL and NoSQL systems, with experience in PostgreSQL and Elasticsearch
- Experience with streaming and batch data processing frameworks and patterns at scale
- Familiarity with LangChain or similar frameworks for building LLM-powered applications
- Strong software engineering fundamentals: version control, code review, testing, CI/CD, and infrastructure as code
- Proven track record of owning production systems from initial design through deployment, monitoring, and iteration
- Excellent problem-solving skills with the ability to debug complex data and system issues
- Strong communication abilities and experience working effectively in cross-functional teams
- Thrives in a remote startup environment with the autonomy to drive projects forward and the collaborative spirit to build something meaningful
Nice to Have
- Experience with Java / Spring Boot or Angular for full-stack development capabilities
- Background in building data platforms that support GenAI or LLM applications, with understanding of vector databases and embedding pipelines
- Experience with real-time data processing and event-driven architectures
- Knowledge of data governance, lineage, and quality frameworks
- Familiarity with scientific/technical data sources (patents, research papers, regulatory filings)
- Experience with search and information retrieval systems at scale
- Background in R&D, scientific computing, or innovation intelligence domains
Why Join Us
- Build the backbone of a platform at the intersection of data and AI.
- Tackle complex challenges in data integration, scalability, and cost optimization.
- Work transparently and collaboratively in a team where ownership is valued.
- Be part of an early-stage startup where speed, impact, and growth with a positive attitude are the norm — not the exception.
Top Skills
Cypris Los Angeles, California, USA Office
6060 Center Dr 10th Floor, Los Angeles, California, United States, 90045
Similar Jobs
What you need to know about the Los Angeles Tech Scene
Key Facts About Los Angeles Tech
- Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
- Key Industries: Artificial intelligence, adtech, media, software, game development
- Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
- Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering



