From Deep Learning for Vision Systems by Mohamed Elgendy

In this part, we will delve into image preprocessing for computer vision systems.

Take 37% off Deep Learning for Vision Systems . Just enter fccelgendy into the discount code box at checkout at

Check out part 1 for an intro to the computer vision pipeline and part 2 for an overview of input images.

Image preprocessing

What is image processing?

In machine learning projects in general, you usually go through a data preprocessing or cleaning step. As a machine learning engineer, you’ll spend a good amount of your time cleaning up and preparing the data before you build your learning model. The goal of this step is to make your data ready for the ML model to make it easier to analyze and process computationally, as it is with images. Based on the problem you’re solving and the dataset in hand, there’s some data massaging required before you feed your images to the ML model.

Image processing could be simple tasks like image resizing. In order to feed a dataset of images to a convolutional network, they must all be the same size. Other processing tasks can take place like geometric and color transformation or converting color to grayscale and many more.

Why image preprocessing?

The acquired data are usually messy and come from different sources. To feed them to the ML model (or neural network), they need to be standardized and cleaned up. More often than not, preprocessing is used to conduct steps that reduce the complexity and increase the accuracy of the applied algorithm. We can’t write a unique algorithm for each of the condition in which an image is taken, thus, when we acquire an image, we tend to convert it into a form that allows a general algorithm to solve it.

Data preprocessing techniques might include:

  1. Convert color images to grayscale to reduce computation complexity: in certain problems you’ll find it useful to lose unnecessary information from your images to reduce space or computational complexity.

    For example, converting your colored images to grayscale images. This is because in many objects, color isn’t necessary to recognize and interpret an image. Grayscale can be good enough for recognizing certain objects. Because color images contain more information than black and white images, they can add unnecessary complexity and take up more space in memory (Remember how color images are represented in three channels, which means that converting it to grayscale reduces the number of pixels that need to be processed).

    Figure 1

    In the example above, you can see how patterns in brightness and darkness of an object (intensity) can be used to define the shape and characteristics of many objects. In other applications, color is important to define certain objects. Like skin cancer detection which relies heavily on the skin colors (red rashes).

    When is color important?

    Converting an image to grayscale might not be a good decision for some problems. There are a number of applications for which color is very important. For example, building a diagnostic system to identify red skin rashes in medical images. This system relies heavily on the intensity of the red color in the skin. Removing colors from the image will make it harder to solve this problem. In general, color images provide very helpful information in many medical applications.

    Another example of the importance of color in images is lane detection applications in self-driving cars. Where the car has to identify the difference between yellow and white lanes because they are treated differently. Grayscale images do not provide enough information to distinguish between the yellow and white lanes.

    Figure 2

    The rule of thumb to identify the importance of colors in your problem is to look at the image with the human eye, if you are able to identify the object that you are looking for in a gray image then you probably have enough information to feed to your model. If not, then you definitely need more information (colors) in your images. The same rule can be applied for most other preprocessing techniques that will be discussed next.

  2. Standardize images: One important constraint that exists in some machine learning algorithms, such as CNN, is the need to resize the images in your dataset to a unified dimension. This implies that our images must be preprocessed and scaled to have identical widths and heights before fed to the learning algorithm.
  3. Data augmentation: Another common pre-processing technique involves augmenting the existing dataset with perturbed versions of the existing images. Scaling, rotations and other affine transformations are typical. This is done to enlarge your dataset and expose the neural network to a wide variety of variations of your images. This makes it more likely that your model recognizes objects when they appear in any form and shape. Here’s an example of image augmentation applied to a butterfly image:

    Figure 3

  4. Other techniques: Many preprocessing techniques can be used to get your images ready to train the machine learning model. In some projects, you might need to remove the background color from your images to reduce the noise. Other projects might require that you brighten or darken your images. In short, any adjustments that you need to apply to your dataset are considered a sort of preprocessing. And you’ll select the appropriate processing techniques based on the dataset at hand and the problem you’re solving. That builds your intuition of which ones you need when working on your own projects.

No free lunch theorem

This is a term that was introduced by David Wolpert in his paper “No Free Lunch Theorems for Optimizations”. You’ll often hear this term when working on ML projects. It means that there’s no single prescribed recipe that fits all models. When working on ML projects, you need to make many choices like building your neural network architecture, tuning hyperparameters, as well as applying the appropriate data preprocessing techniques. No free lunch is used to say that, although there are some rules of thumb to tackle certain problems, there’s no one recipe which is priori guaranteed to work well in all situations. We must make certain assumptions about the dataset and the problem that we’re trying to solve. For some datasets it’s best to convert the colored images to grayscale, yet for other datasets you might need to keep or adjust the color images.

The good news is, unlike traditional machine learning, deep learning algorithms require minimum data preprocessing because, as you’ll see in the next few pages, neural networks do most of the heavy lifting in processing an image and extracting features.

That’s all for now. Keep a look out for part 4. If you’re interested in learning more about the book, check it out on liveBook here and see this slide deck.