How Automated Machine Learning Can Get Data Science Out of the Woods

It’s easy to get lost in the woods — even more so when the trees are made of data and you’re an analyst trying to muddle through information. When faced with the challenge of finding solutions in a time crunch, it’s imperative that data science teams adopt automated machine learning, or autoML, to alleviate some of the pressure.

At federally-funded aerospace research and development center The Aerospace Corporation, Silvia Chavarin’s team is well-versed in autoML as a resource for experimenting with speed and developing solutions for customers. But it’s not just a matter of using off-the-shelf methods, the machine learning engineering manager explained. Rather, it’s the implementation of various open-source, Python-based packages — in particular, the Tree-Based Pipeline Optimization Tool — that makes automation so practical.

“We leveraged autoML to create efficiencies in what otherwise would have been a time-intensive signal processing problem,” said Chavarin. With the use of TPOT, one of the first open-source autoML packages, the team was able to rely on machine learning software to identify the best possible pipeline for the process in question.

But their automation journey is still a work in progress. Built In LA sat down with Chavarin to learn how the data science and artificial intelligence department works around the limitations that come with automating processes and what methods her team hopes to explore in order to ensure autoML can do more heavy lifting in the future.

The Aerospace Corporation team members — The Aerospace Corporation

Silvia Chavarin

Manager, Machine Learning Engineering • The Aerospace Corporation

How is your team currently leveraging autoML?

Teams within the data science and artificial intelligence department at The Aerospace Corporation are leveraging autoML to help quickly arrive at machine learning solutions for time-series and signal-processing problems.

Our data scientists have implemented autoML to test a wide range of algorithms for time-series forecasting and anomaly detection. When forecasting time-series data, there may be numerous statistical methods available as baselines, but insufficient bandwidth to set up multiple experiments. Our data scientists have leveraged packages such as the H2O.ai open-source Python API to enable quick experimentation and particularly appreciate intuitive interfaces that facilitate incorporating supported algorithms with minimal setup.

Inside The Aerospace Corporation’s autoML Toolbox

H2O.ai, an open-source Python package
Merlion, an open-source Python library
Prophet, a time-series forecasting package
Tree-Based Pipeline Optimization Tool, an open-source autoML package
RAPIDS, a suite of GPU accelerated data science software libraries

How is your team leveraging autoML to create efficiencies and move more quickly?

Our team uses open-source, off-the-shelf methods and preexisting models as a basis for the solutions we develop for our customers. To speed up this process, our data scientists leverage open-source, Python-based time series intelligence packages like Merlion, coupled with time-series forecast packages, such as Prophet. This combination empowers data scientists to optimize and create ensemble methods without having to write hundreds of lines of code to achieve the end result.

As a specific example, we leveraged autoML to create efficiencies in what otherwise would have been a time-intensive signal-processing problem. The objective was to develop a classifier to recognize modulation and statistical properties of two or more superimposed digital signals. Due to the multitude of available features, the number of possible statistical algorithms and other parameters, the trade space was too large to explore efficiently. To identify statistical models and establish baseline performance more quickly, we utilized one of the first open-source autoML algorithms available — TPOT — which identified the optimal ML pipeline to utilize for this process.

AutoML is powerful, but it isn’t a silver bullet. What’s the biggest limitation you come across with this technology, and how do you maneuver around it?

While autoML is an excellent tool to quickly identify machine learning solutions, our team has encountered a few limitations in practice. In general, autoML algorithms need to run for an extended period of time to yield meaningful results. There are recent TPOT and RAPIDS integrations that our team is looking to further explore in the future to mitigate limitations due to run-time duration.

From a time-series perspective, there is no consensus on how data should be transformed for ingestion into existing autoML packages. For example, certain libraries allow users to specify how data should be processed, while others may require data to follow a specific set of rules. Each library also requires a separate configuration and processing setup, which means code written using a specific autoML package may have to be refactored to be compatible with other formats. Before incorporation of an autoML package, thorough review of documentation can help identify packages with useful functionality or similar setups.

The Aerospace Corporation is Hiring | View 336 Jobs

Inside The Aerospace Corporation’s autoML Toolbox

Recent Articles