From Deep Learning for Vision Systems by Mohamed Elgendy

In this part, we’ll discuss using classifier learning algorithms and wrap up all we’ve learned in the series.

 


Take 37% off Deep Learning for Vision Systems. Just enter fccelgendy into the discount code box at checkout at manning.com.


Check out part 1 for an intro to the computer vision pipeline, part 2 for an overview of input images, part 3 to learn about image preprocessing, and part 4 for info on feature extraction.

Classifier learning algorithm

Okay, here’s what we discussed this far in the classifier pipeline?  If you want to start at the beginning, part 1 provides an intro to computer vision.

  1. Input image: We’ve seen how images are represented as functions and learned that computers see images as 2D matrix for grayscale images and 3D matrix (3 channels) for colored images.
  2. Image preprocessing: we discussed some image preprocessing techniques to clean up our dataset and make it ready to be fed to the ML algorithm.
  3. Feature extraction: we converted our large dataset of images into a vector of useful features that uniquely describe the objects in the image

Now it’s time to feed the extracted features vector to the classifier to output a class label for the images (e.g. motorcycle or not).

As we discussed, the classification task is done by either one of these types: 1) traditional machine learning algorithms like SVMs and Random Forest, or 2) deep neural network algorithms like CNNs. Although traditional ML algorithms might get some decent results for some problems, convolutional neural networks (CNNs) truly shine in processing and classifying images in the most complex problems. For now, I want you to focus on the fact that neural networks automatically extract useful features from your dataset + act as a classifier to output class labels for the images. Input images passes through the layers of the neural network to learn their features layer-by-layer. The deeper your network is (more layers), the more it learns the features. Hence, the name deep learning. More layers come with some tradeoffs. The last layer of the neural network usually acts as the classifier that outputs the class label.


Figure 1


Summary and takeaways

This articles series was designed to give you a 30,000 feet overview on computer vision systems and their applications. I don’t expect you to have a deep understanding on the pipeline components yet. What I want you to have taken away from this article is the following:

  • Human vs machine vision system: both contain two basic components: 1) a sensing device and 2) an interpreting device.Figure 2
  • Zooming in to the second component, interpreting device, we’ll see the pipeline used to process images coming from the sensing device and produce an output prediction of the image content.Figure 3
  • Input data: we learned that image can be represented as a function of X and Y. We also say how computers see images as a matrix of pixel values. One channel for grayscale images and three channels for color images.Figure 4
  • Preprocessing: image processing techniques vary for each problem and dataset. Some of these techniques are: convert images to grayscale to reduce complexity, resizing images to a uniform size to fit your neural network, data augmentation, etc. As a ML engineer, you’ll spend a good amount of your time in cleaning up and preparing the data before you build your learning model.
  • Feature extraction: features are unique properties in the image which are used to classify its objects. Traditional machine learning algorithms use several feature extraction methods. On the other hand, neural networks don’t require an extraction algorithm. It does most of the heavy lifting in processing an image and extracting useful features.Figure 5
  • Classifier algorithm: Although traditional ML algorithms might get decent results for some problems, convolutional neural networks (CNNs) truly shine in processing and classifying images in the most complex problems. Neural networks automatically extract useful features from your dataset and act as a classifier to output class labels for the images. Input images passes through the layers of the neural network to learn their features layer-by-layer. The deeper your network is (more layers), the more it learns its features. Hence, the name deep learning. More layers come with some tradeoffs. The last layer of the neural network usually acts as the classifier that outputs the class label.Figure 6

That’s all for this article, and the series. We hope that you found it informative and enjoyable. If you’re interested in learning more about the book, check it out on liveBook here and see this slide deck.