From Automated Machine Learning in Action by Qingquan Song, Haifeng Jin, and Xia Hu
This article covers
• Defining and introducing the fundamental concepts of machine learning
• Describing the motivation for and high-level concepts of automated machine learning
Artificial intelligence (AI) has been extensively explored in recent years, reaching into many aspects of everyday life. It attempts to use computational devices to automate tasks by allowing them to perceive the environment as humans do. As a branch of AI, machine learning (ML) tries to enable a computer to perform a task through self-exploration of data. It allows the computer to learn, in order to do things that go beyond what we know how to order it to do, but the barriers to entry are high: the cost of learning the techniques involved and accumulating the necessary experience with applications means ML can’t easily be used by practitioners without much expertise. Taking ML techniques from their ivory tower and making them accessible to more people is becoming a key focus of research and industry. Toward this end, automated machine learning (AutoML) has emerged as a prevailing research field. Its aim is to simulate how human experts solve ML problems and discover the optimal ML solutions for a given problem automatically, thereby granting practitioners without extensive experience access to off-the-shelf ML techniques. As well as being beneficial for newcomers, this relieves experts and data scientists of the burden of designing and configuring ML models. Because this is a cutting-edge topic, it’s new to most people, and its current capabilities are often exaggerated by mass media. To give you a glimpse of what AutoML is, this article provides some background and an introduction to the fundamental concepts, and orients you to its research value and practical benefits. Let’s start with a toy example.
A glimpse of automated machine learning
Suppose you want to design an ML model to recognize handwritten digits in images. The ML model takes the images as inputs and output the corresponding digits in each of the images (figure 1).
Figure 1. Recognizing handwritten digits with an ML model
In case you’re not experienced with ML, let’s use a programmatic illustration with Pythonic style to show how we usually achieve this goal in practice. We take an ML model as an object instantiated from a class, as shown in listing 1. This class corresponds to a specific type of ML algorithm (a set of procedures) that we like to use in our model. To instantiate a model, besides selecting the algorithm class to be used, we also need to feed the algorithm some historical data and arguments (arg1 and arg2). The historical data used here consists of images of handwritten digits, whose labels (corresponding numbers) are already known. This helps the machine (or the ML algorithm) to conduct the learning process—to learn how to recognize the digits in images, similar to how a child is trained to recognize objects from pictures. The arguments here are used to control the algorithm, instructing it how to do this process. The resulting ML model is able to predict the digits in previously unseen images (figure 1) with the second line of code in listing 1.
Listing 1. A simplified ML process
ml_model = MachineLearningAlgorithm1(arg1=..., arg2=..., data=historical_images) ❶ digits=[model.predict_image_digit(image) for image in new_images] ❷
❶ Create an ML model
❷ Make predictions with the ML model
As you can see from the code, besides the dataset, which we may need to prepare ourselves, there are two things that we need to provide based on our prior knowledge to address the task:
- The ML algorithm (or method) to be used; which is, MachineLearningAlgorithm1
- The arguments of the algorithm
Selecting the algorithm and configuring its arguments can be difficult in practice. Let’s use algorithm selection as an example. As a beginner, a typical approach is to collect some learning materials, explore the code for some related tasks, and identify a pool of ML algorithms you might be able to use for the task at hand. You can then try them out one-by-one on your historical data (as we do in listing 1) and pick the best one from them based on their performance at recognizing the digits in the images. This repetitive process is summarized in listing 2.
Listing 2. A naive way of selecting ML algorithms
ml_algorithm_pool = [MachineLearningAlgorithm1, ..., MachineLearningAlgorithmN] ❶ for ml_algorithm in ml_algorithm_pool: ❷ model = ml_algorithm(arg1=..., arg2=..., data=historical_images) ③ result = evaluate(model) ❸ push result into the result_pool push model into the model_pool best_ml_model = pick_the_best(result_pool, ml_model_pool) ❹ return best_ml_model
❶ A pool of ML algorithms to be tested
❷ Loop over all the candidate ML algorithms
❸ Instantiate and evaluate the ML model based on each ML algorithm
❹ Select the best ML model based on the performance
The process looks intuitive but may take you hours or days if you don’t have much ML knowledge or experience, but there are a few reasons for this. First, collecting a pool of feasible ML algorithms is challenging. You may need to explore the literature, identify the state-of-the-art algorithms, and learn how to implement them. Second, the number of feasible ML algorithms could be huge. Trying them out one-by-one may not be a good choice and may even be prohibitive. Third, each algorithm has its own arguments. Configuring them correctly requires expertise, experience, and even some luck.
Might there be a better way of doing this? Is it possible to let the machine perform automatically for you? If you’ve faced similar problems and want to adopt ML in a more labor-saving way, AutoML could be the tool you are looking for. Loosely speaking, AutoML mimics the manual process described in the preceding pseudocode. It tries to automate the repetitive and tedious process of selecting and configuring ML algorithms, and allows you access to many advanced algorithms without even knowing they exist? The following two lines of pseudocode illustrate how to use an AutoML algorithm to generate the ML solution.
automl_model = AutoMLAlgorithm() best_ml_model = automl_model.generate_model(data=historical_images)
Creating an AutoML model object from an AutoML algorithm means you don’t even need to provide the pool of ML algorithms to test, and you can generate the desired model by feeding data into it.
How do you select an AutoML algorithm? What are the ML algorithms it chooses from? How does it evaluate and choose a model? Before going any further, I’ll give you some background on ML to better understand what AutoML automates and how to use it in practice to save yourself time and effort. The focus here is on what you need to know to learn and use AutoML. If you want to learn more about these algorithms, I recommend referring to ML books, such as Machine Learning in Action by Peter Harrington (Manning, 2012) and Deep Learning with Python, 2nd ed., by François Chollet (Manning, 2021). For readers who are already familiar with the basics of ML, this next section serve as a recap, make sure we’re all on the same page with some terminology, and better motivate the following introduction to AutoML.
Getting started with machine learning
This section provides a brief introduction to ML—what it is, the critical components in an ML algorithm, and how an ML model is created based on a selected algorithm and data input. Learning these basics is essential to understanding the concepts of AutoML introduced in the next sections.
What is machine learning?
Before the appearance of ML the dominant paradigm in AI research was symbolic AI, where the computer could only process data based on predefined rules explicitly input by humans. The advent of ML revolutionized the programming paradigm by enabling knowledge to be learned from the data implicitly. For example, suppose you want a machine to recognize images of apples and bananas automatically. With symbolic AI, you need to provide human-readable rules associated with the reasoning process, perhaps specifying features like color and shape, to the AI method. By contrast, an ML algorithm takes a bunch of images and their corresponding labels (“banana” or “apple”) and outputs the learned rules, which can be used to predict unlabeled images (figure 2).
Figure 2. Comparison of symbolic AI and ML
The essential goals of ML are automation and generalization. Automation means an ML algorithm is trained on the data provided to automatically extract rules (or patterns) from the data. It mimics human thinking and allows the machine to improve itself by interacting with the historical data fed to it, which we call training or learning. The rules are then used to perform repetitive predictions on new data without human intervention. For example, in figure 2 the ML algorithm interacts with the apple and banana images provided and extracts a color rule that enables it to recognize them through the training process. These rules can help the machine classify new images without human supervision, which we call generalizing to new data. The ability to generalize is an important criterion in evaluating whether an ML algorithm is good. In this case, suppose an image of a yellow apple is fed to the ML algorithm—the color rule won’t enable it to correctly discern whether it’s an apple or a banana. An ML algorithm that learns and applies a shape feature for prediction may provide better predictions.
The machine learning process
An ML algorithm learns rules through exposure to examples with known outputs. The rules are expected to enable it to transform inputs into meaningful outputs, such as transforming images of handwritten digits to the corresponding numbers. The goal of learning can also be thought of as enabling data transformation. The learning process generally requires two components:
- Data inputs— Data instances of the target task to be fed into the ML algorithm. For example, in the image recognition problem (figure 2), a set of apple and banana images and their corresponding labels.
- Learning algorithm–A mathematical procedure to derive a model based on the data inputs. It contains four elements:
- An ML model with a set of parameters to be learned from the data
- A measurement to measure the model’s performance (such as prediction accuracy) with the current parameters
- A way to update the model, which we call an optimization method
- A stop criterion to determine when the learning process should stop
After the model parameters are initialized. The learning algorithm can update the model iteratively by modifying the parameters based on the measurement until the stop criterion is reached. This measurement is called a loss function (or objective function) in the training phase; it measures the difference between the model’s predictions and the ground-truth targets. This process is illustrated in figure 3.
Figure 3. The process of training an ML model
Let’s look at an example to help you better understand the learning process. Imagine we have a bunch of data points in two-dimensional space (figure4). Each point is either black or white. We want to build an ML model that, whenever a new point arrives, can decide whether this is a black point or a white point based on the point’s position. A straightforward way to achieve this goal is to draw a horizontal line to separate the two-dimensional space into two parts based on the data points in hand. This line could be regarded as an ML model. Its parameter is the horizontal position, which can be updated and learned from the provided data points. Coupled with the learning process introduced in figure3, the required components could be chosen and summarized as follows:
- The data inputs are a bunch of white and black points described by their location in the two-dimensional space.
- The learning algorithm consists of four selected components:
- ML model–A horizontal line which can be formulated as y=a, where a is the parameter that can be updated by the algorithm.
- Accuracy measurement–The percentage of points which are labeled correctly based on the model.
- Optimization method–Move the line up or down by a certain distance. The distance can be related to the value of the measurement in each iteration. It won’t stop until the stop criterion is satisfied.
- Stop criterion–Stop when the measurement is one, which means all the points in hand are labeled correctly based on the current line.
Figure 4. An example of the learning process: learning a horizontal line to split white and black points
In this example (figure 4) the learning algorithm takes two iterations to achieve the desired line, which separates all the input points correctly, but in practice, this criterion may not always be satisfied. It depends on the distribution of the input data, the selected model type, and how the model is measured and updated. We often need to choose different components and try different combinations to adjust the learning process to get the expected ML solution. Also, even if the learned model is able to label all the training inputs correctly, it isn’t guaranteed to work well on unseen data. The model is ability to generalize may not be good (we’ll discuss this further in the next section). It’s important to select the components and adjust the learning process carefully.
How do we select the proper components to adjust the learning process to derive the expected model? To answer this question, we need to introduce a concept called hyperparameters and clarify the relationship between these and the parameters we’ve been discussing:
- Parameters are variables that can be updated by the ML algorithm during the learning process. They’re used to capture the rules from the data. For example, the position of the horizontal line is the only parameter in our previous example (figure 4) to help classify the points. It’s adjusted during the training process by the optimization method to capture the position rule for splitting the points with different colors. By adjusting the parameters, we can derive an ML model which is capable of accurately predicting the outputs of the given input data.
- Hyperparameters are also parameters, but they’re ones we predefine for the algorithm before the learning process begins, and their values remain fixed during the learning process. These include the measurement, the optimization method, the speed of learning, the stop criterion, and so on. An ML algorithm usually has multiple hyperparameters. Different combinations of them have different effects on the learning process, resulting in ML models with different performances. We can also consider the algorithm type (or the ML model type) as a hyperparameter, because we select it ourselves and it’s fixed during the learning process.
The selection of an optimal combination of hyperparameters for an ML algorithm is called hyperparameter tuning. It’s often done through an iterative process. In each iteration, we select a set of hyperparameters to use to learn an ML model with the training dataset. The ML algorithm block in figure 5 denotes the learning process described in figure 3. By evaluating each learned model on a separate dataset called the validation set, we can then pick the best one as the final model. We can evaluate the generalizability of that model using another dataset called the test set, which concludes the whole ML workflow.
Figure 5. The classical ML workflow
In general, we have three datasets in the ML workflow. Each dataset is distinct from the other two:
- The training set is used during the learning process to train a model given a fixed combination of hyperparameters.
- The validation set is used during the tuning process to evaluate the trained models in order to select the best hyperparameters.
- The test set is used for the final testing, after the tuning process. It’s used only once, after the final model is selected, and shouldn’t be used for training or tuning the ML algorithm.
The training and test sets are straightforward to understand. The reason we want to have an additional validation dataset is to avoid exposing the algorithm to all the training data during the tuning stages—this enhances the generalizability of the final model to unseen data. If we don’t have a validation set, the best model selected in the tuning stage is the one that focuses on extracting any subtle features in the training data to ceaselessly increase the training accuracy without caring about any unseen dataset. This likely leads to bad performance on the final test set, which contains different data. When the model performs worse on the test set (or validation set) than the training set, this is called overfitting. It’s a well-known problem in ML and often happens when the model’s learning capacity is too strong and the size of the training dataset is limited. For example, suppose you want to predict the fourth number of a series, given the first three numbers as training data: a1=1, a2=2, a3=3, a4=?. If the right solution is a4=4, a naive model, ai=i, provides you the correct answer. If you use a third-degree polynomial to fit the series, a perfect solution for the training data is ai = i3 – 6i2+12i-6, which predicts a4 as 10. The validation process enables a model’s generalization ability to be better reflected during evaluation to better select models.
Overfitting is one of the most important problems studied in ML. Besides doing validation during the tuning process, there are many other ways to address the problem, such as augmenting the dataset, adding regularization to the model to constrain its learning capacity during training. We won’t go into this in more depth here. To learn more about this topic, see Chollet’s Deep Learning with Python, 2nd Edition.
The obstacles to applying machine learning
At this point, you should have a basic understanding of what ML is and how it proceeds. Although there are many mature ML toolkits that you can make use of, you may still face many difficulties in practice. This section describes some of these challenges—the aim isn’t to scare you off, but to provide context for the AutoML techniques which are described afterward. Obstacles you may meet include:
- The cost of learning ML techniques–We’ve covered the basics, but more knowledge is required when applying ML on a real problem. For example, you’ll need to think about how to formulate your problem as an ML problem, which ML algorithms you could use for your problem and how they work, how to clean and preprocess the data into the expected format to input into your ML algorithm, which evaluation criteria should be selected for model training and hyperparameter tuning. All these questions need to be answered in advance, and doing this may require a large time commitment.
- Implementation complexity–Even with the necessary knowledge and experience, implementing the workflow after selecting an ML algorithm is a complex task. The time required for implementation and debugging grows as more advanced algorithms are adopted.
- The gap between theory and practice— The learning process can be hard to interpret, and the performance is highly data-driven. Furthermore, the datasets used in ML are often complex and noisy, and can be difficult to interpret, clean, and control. This means the tuning process is often more empirical than analytical. Even ML experts sometimes can’t achieve the desired results.
These difficulties significantly impede the democratization of ML to people with limited experience, and correspondingly increase the burden on ML experts. This has motivated ML researchers and practitioners to pursue a solution to lower the barriers, circumvent the unnecessary procedures, and alleviate the burden of manual algorithm design and tuning—AutoML.
AutoML: The automation of automation
The goal of AutoML is to allow a machine to mimic how humans design, tune, and apply ML algorithms to adopt ML more easily (figure 6). Because a key property of ML is automation, AutoML can be regarded as automating automation.
Figure 6. The main goal of AutoML: taking humans out of the loop of ML algorithm design and tuning to help you understand how AutoML works, let’s first go over the key components.
Three key components of AutoML
ml_algorithm_pool = [MachineLearningAlgorithm1, ..., MachineLearningAlgorithmN] for ml_algorithm in ml_algorithm_pool: model = ml_algorithm(arg1=..., arg2=..., data=historical_images) result = evaluate(model) push result into the result_pool push model into the model_pool best_ml_model = pick_the_best(result_pool, ml_model_pool) return best_ml_model
❶ A pool of ML algorithms to be tested
❷ Loop over all the candidate ML algorithms
❸ Instantiate and evaluate the ML model based on each ML algorithm
❹ Select the best ML model based on the performance
This pseudocode can be regarded as a simple AutoML algorithm that takes a pool of ML algorithms as input, evaluates them one-by-one, and outputs a model learned from the best algorithm. Each AutoML algorithm consists of three core components (figure 7):
- Search space–A set of hyperparameters, and the ranges of each hyperparameter to be selected from. The range of each hyperparameter can be defined based on the user’s requirements and knowledge. For example, the search space can be a pool of ML algorithms, as shown in the pseudocode. In this case, we treat the type of ML algorithm as a hyperparameter to be selected. The search space can also be the hyperparameters of a specific ML algorithm, such as the structure of the ML model. The design of the search space’s highly task-dependent, because we may need to adopt different ML algorithms for various tasks. It’s also quite personalized and ad hoc, depending on the user’s interests, expertise, and level of experience. A trade-off between the convenience you’ll enjoy by defining a large search space and the time you’ll spend identifying a good model (or the performance of the model you can achieve in a limited amount of time) exists. For beginners, it can be tempting to define a broad search space which is general enough to apply to any task or situation, such as a search space containing all the ML algorithms—but the time and computational cost involved make this a poor solution.
- Search strategy–A strategy to select the optimal set of hyperparameters from the search space. Because AutoML is often an iterative trial-and-error process, the strategy often sequentially selects the hyperparameters in the search space and evaluates their performance. It may loop through all the hyperparameters in the search space (as in the pseudocode), or the search strategy may be adapted based on the hyperparameters that have been evaluated this far in order to increase the efficiency of the later trials. A better search strategy can help you achieve a better ML solution within the same amount of time. It may also allow you to use a larger search space by reducing the search time and computational cost.
- Performance evaluation strategy–A way to evaluate the performance of a specific ML algorithm instantiated by the selected hyperparameters. The evaluation criteria are often the same as the ones used in manual tuning; for example, the validation performance of the model learned from the selected ML algorithm. In this article, different evaluation strategies are discussed in the context of adopting AutoML to solve different types of ML tasks.
Figure 7. The AutoML process
To facilitate the adoption of AutoML algorithms, an AutoML toolkit often wraps up these three components and provides some general application programming interfaces (APIs) with a default search space and search algorithm to keep you from worrying about selecting them yourself. For end users, in the simplest case, all you need to do to obtain the final model is provide the data, as shown here—you don’t even need to split the data into training and validation sets:
automl_model = AutoMLAlgorithm() best ml model = automl_model.generate_model(data=...)
Because different users may have different use cases and levels of ML expertise, they may need to design their own search spaces, evaluation strategies, and even search strategies. Existing AutoML systems therefore often also provide APIs with configurable arguments to allow you to customize different components. A broad spectrum of solutions are available, from the simplest to the most configurable (figure 8).
Figure 8. The spectrum of AutoML APIs
The range of APIs available allows you to pick the most suitable one for your use case. This article teaches you how to select the right API in an advanced AutoML toolkit, AutoKeras, for different AutoML applications. You’ll also learn how to create your own AutoML algorithm with the help of AutoKeras.
Can we achieve full automation?
The field of AutoML has been evolving for three decades, with the involvement of industry and the open source community. Many successful implementations and promising developments have been seen:
- Many internal tools and open source platforms have been developed to help with hyperparameter tuning of ML models and model selection (Google Vizier, Facebook Ax).
- AutoML solutions performing at near human levels have been observed in many Kaggle data science competitions.
- Vast open source ML packages for improved hyperparameter tuning and ML pipeline creation have been developed, such as Auto-Sklearn, AutoKeras.
- Commercial AutoML products are helping many companies, big and small, to adopt ML in production. For example, Disney has successfully used Google Cloud AutoML to develop ML solutions for its online store without hiring a team of ML engineers (https://blog.google/ products/google-cloud/cloud-automl-making-ai-accessible-every-business/).
- Researchers in fields other than computer science, such as medicine, neurobiology, and economics, are also using the power of AutoML. They can now bring new ML solutions to domain-specific problems such as medical image segmentation, genomic research, and animal recognition and protection, without going through the long learning curve of ML and programming.
We’re still exploring the full capabilities of AutoML to democratize ML techniques and make them accessible to more people in different domains. Despite the many successful applications of AutoML seen so far, there are still a lot of challenges and limitations to be further explored and addressed. These include:
- The difficulty of building AutoML systems. Compared to building an ML system, building an AutoML system from scratch is a more complex and involved process.
- The automation of collecting and cleaning data. AutoML still requires people to collect, clean, and label data. These processes are often more complicated in practice than the design of ML algorithms, and, for now at least, they can’t be automated by AutoML. For AutoML to work today, it has to be given a clear task and objective with a high-quality dataset.
- The costs of selecting and tuning the AutoML algorithm. The “no free lunch” theorem tells us that there’s no omnipotent AutoML algorithm that fits any hyperparameter tuning problem. The effort you save on selecting and tuning an ML algorithm may be amortized or even outweighed by the effort you need to put into selecting and tuning the AutoML algorithm.
- Resource costs. AutoML is a relatively costly process, in terms of both time and computational resources. Existing AutoML systems often need to try more hyperparameters than human experts to achieve comparable results.
- The cost of human-computer interaction. Interpreting the solution and the tuning process of AutoML may not be easy. As these systems become more complex, it becomes harder and harder for humans to get involved in the tuning process and understand how the final model is achieved.
AutoML is still in its early stages of development, and its continuing progress relies heavily on the participation of researchers, developers, and practitioners from different domains. Although you may contribute to that effort one day, the goal of this article’ more modest. It mainly targets practitioners who have limited expertise in machine learning, or who have some experience but want to save themselves some effort in creating ML solutions. We teach you how to address an ML problem automatically with even as few as five lines of code. It gradually approaches more sophisticated AutoML solutions for more complicated scenarios and data types, such as images, text, and so on.
- Machine learning refers to the capacity of a computer to modify its processing by interacting with data automatically, without being explicitly programmed.
- The ML process can be described as an iterative algorithmic process to adjust the parameters of an ML model based on the data inputs and certain measurements. It stops when the model is able to provide the expected outputs, or when some particular criterion defined by the user is reached.
- Tuning the hyperparameters in an ML algorithm allows you to adjust the learning process and select components tailored to the ML problem at hand.
- AutoML aims to learn from the experience of designing and applying ML models and automate the tuning process, relieving data scientists of this burden and making off-the-shelf ML techniques accessible to practitioners without extensive experience.
- An AutoML algorithm consists of three key components: the search space, search strategy, and evaluation strategy. Different AutoML systems provide different levels of APIs that either configure these for you or allows you to customize them based on your use case.
- Many unaddressed challenges are found in AutoML, preventing it from living up to the highest expectations. Achieving true automatic machine learning is difficult. We should be optimistic, but also take care to avoid exaggerating AutoML’s current capabilities.
If you want to learn more, check out the book on Manning’s liveBook platform here.