When David Ersek reflects on the two-year migration he and his fellow software engineers at System1 recently completed, he brings himself back to a moment in early 2019 when many of the engineers on his team were gathered around a computer, waiting for someone to click a button.
At the time, the engineers at the Los Angeles adtech company had embarked on a big project — consolidating terabytes of data across several legacy data warehouses into Snowflake. Ersek and his team had just completed the first and most challenging warehouse of all, a database migration deemed so important that one of Ersek’s coworkers likened it to “replacing the parts of the airplane while it was still in the air.”
The engineers revealed the new database to the company, but just in case they had to roll things back or find other datasets, they left the old one on. After one week, the team decided it was finally safe to turn it off.
“I had this huge sigh of relief,” Ersek said. “For me, that was the moment where I could sit back and say, ‘We did it.’”
Multiply that experience by five and spread it out over two years. That’s what the System1 engineering team accomplished, migrating five database warehouses into Snowflake after years of exponential growth and multiple acquisitions. We spoke with Ersek, along with Vin Yam, the company’s VP of engineering, and Lydia Liang, a senior data engineering product manager, to learn more about what prompted this migration, the challenges they faced and what they learned.
What They Do
What was the catalyst for this project?
Liang: We wanted to have one source of data for our stakeholders. As the company continued to grow, we were intaking more volumes of data, and it became more of an issue. Just in the previous two years, we completed multiple acquisitions, which is why we accumulated five data warehouses. People were feeling the pain points of not being able to retrieve data fast enough, so two years ago, we decided that we needed to get it all in one consolidated place, which is why we chose Snowflake.
Yam: This consolidation effort was spawned by a real need, and not only from the analytics and business side, but even within the data engineering team. It was very difficult to manage many different legacy data warehouses to support the business. We had multiple Redshift warehouses, a Postgres warehouse and a SQL server data warehouse — really, all the flavors of the rainbow as far as database engines. Each one is unique in that it requires someone to be trained up on that database engine. It caused the team to be focused more on maintenance tasks than supporting the business.
Ersek: For the company, we wanted to provide a common interface to the data. Snowflake has fewer limitations on viewing large amounts of data and allows us to keep a lot of the historical data that normally we would’ve rotated out to something that would be harder to retrieve.
This consolidation effort was spawned by a real need, and not only from the analytics and business side, but even within the data engineering team.’’
What was your primary role during this migration?
Liang: From the product side, it was setting milestones and coordinating with all of the stakeholders. Everybody who needed data was touching the things that we were trying to migrate, so we were being very careful about what was risky and what was not risky.
Yam: For me, it was making sure that the other tech team saw this as an opportunity, rather than just something that the data engineering team was trying to push through. Prior to starting the migration, we had done some roadshows of what the end state would look like, just because engineers are generally suspicious about what a particular piece of software should do. It’s very important to demonstrate that yes, it can actually do that, and it can actually work really well with what we do. It was my job to keep everyone in the loop as far as our progress on the technical side, and how that fit in with the overall System1 technical architecture.
Ersek: As an engineer, a lot of it was focused around understanding how the different pieces of data connected together. When you’re looking to reconstruct a database, you have to understand how each table connects to the next table. In addition to porting the datasets, there was also a lot of reorganization of the datasets themselves to have a more streamlined process and make sure that we weren’t taking more time than we needed to do the migration.
Each of you has done a data migration in the past. How did this one compare to your previous experiences?
Yam: I’ve been involved in data migrations throughout my career. Most of them can be described with a whole lot of Gantt charts, and a multi-year project that really drove everyone nuts. For this particular migration at System1, I think it was the smoothest one that I’ve ever seen. We didn’t spend a ton of time working through Gantt charts, but instead, we worked through what people actually needed. You have to understand that these data warehouses didn’t show up from the ether. They represented years upon years of a product existing and having users and things like that.
The migration process is very similar to moving houses. Of course, you could always put everything in your old house into boxes and ship it across to your new house. But everyone usually takes the time to figure out if they actually need to move it, or if it can go into external storage because it might be something that you're probably not going to use on a regular basis — but you still want it there just in case. So that was an opportunity for people to sort through what they wanted to move, if at all. In the end, once we completed the migration, everyone was much happier. They didn’t have to work with the old tech and all of its deficiencies anymore.
Liang: We had five warehouses consolidated into Snowflake in two years, which is actually very, very quick because I’ve done this before at a different company and it took about a year and a half to do one warehouse.
Ersek: For me, it was a mix of excitement and anxiousness. The excitement was getting to read through the different code bits, the views, the tables and the processes. In a way, you’re putting a puzzle together in reverse and figuring out how things fit together. And of course, the anxiousness comes from the chance that something gets overlooked; we do a ton of preparation and testing, but from experience, there’s always some amount of risk.
How did the pandemic and remote work affect this project?
Liang: Our team is already divided between different locations. Some of our core engineers are in Washington state, so we’ve already worked remotely for our daily syncs and for our planning. For us, the transition to fully remote work wasn't as difficult as it might have been for other teams within the company, since they're more used to sitting side by side. Also, we had done a lot of preparation work on this migration. So when we switched to remote work, we were still on track and we knew exactly what we needed to do. Having that clear plan ahead of time was critical.
Ersek: The key to remote work is communication. Our communication structure was set up before the pandemic, from how teams would coordinate with each other to how the project managers would determine which tables were needed and what was next on the schedule.
What’s the next big project for the engineering team?
Yam: Our team is very good at identifying when we have holes in the services that we’re providing to the rest of the company. As we were drawing close to the end of our data warehouse consolidation, the next big question we asked was, “How do we provide a way for the company to find the data that they’re looking for?” I’d draw an analogy to the Library of Congress. It has pretty much every written work, but it’s completely useless without a way to actually find the book that you’re looking for.
...how do we leverage our new data capabilities to drive meaningful business improvement?’’
The second big thing is, now that we’ve migrated to Snowflake, we have the capability to do things that were very challenging before. So how do we leverage our new data capabilities to drive meaningful business improvement? For starters, we can provide much clearer command and control to executives and managers to operate the business. Our data is better organized for our data science team to leverage more broadly and with increased impact, and it allows us to identify cross-company insights and trends, which will lead to more business opportunities.