Description: C:\Users\Chris\Desktop\Manning\Images\cover images\deep learning with pytorch.jpg

From Deep Learning with PyTorch by Eli Stevens and Luca Antiga

This article introduces you to PyTorch and discusses why you might want to use it in your deep learning projects.

Take 37% off Deep Learning with PyTorch. Just enter code fccstevens into the promotional discount code box at checkout at manning.com.

In this 3-part series you’re going to get to know the PyTorch deep learning framework. Let’s start with an overview of PyTorch itself and how it fits into the deep learning landscape.

Why PyTorch

Why should you choose PyTorch today, given the growing amount of capable tools (e.g. Keras, TensorFlow, Theano) for deep learning on the market today?

A design driver for PyTorch is expressivity, which is allowing a developer to implement complicated models without extra complexities imposed by the framework. When a new paper comes out and a practitioner sets out to implement it, the most desirable thing for a tool is to stay out of the way. The less overhead in the process, the quicker and most successful will be the implementation and the experimentation that eventually follows. PyTorch arguably offers one of the most seamless translations of ideas into Python code available in the deep learning landscape, and it does this without sacrificing performance. Although it features an expressive and user-friendly high-level layer, PyTorch isn’t a high-level wrapper on top of a lower-level library, and it doesn’t require the beginner to learn another tool, like Theano or TensorFlow, when models become complicated. Even in the case new low-level kernels need to be introduced, say convolutions on hexagonal lattices, PyTorch offers a low-overhead pathway to achieve that goal.

Directly linked to the previous point is the ability to debug PyTorch code. Debugging is currently one of the main pain points of frameworks relying on static computation graphs. In these frameworks, execution happens after the model is defined in its entirety and the code has been compiled by the symbolic graph engine. This creates some disconnect between a bug in the code and its effect on the execution of the entire graph. In PyTorch, execution is greedy: statements are executed at the time they’re invoked in Python. After the execution of a statement, the data it generated is immediately available for inspection. This makes debugging more direct.

Its greedy execution model makes PyTorch behave like another Python library, like with NumPy, only with GPU acceleration, neural network kernels and automatic differentiation. This applies to debugging as well as integrating PyTorch with other libraries—like writing a neural network operation using SciPy, for instance.

From an ecosystem perspective, PyTorch embraces Python, the emergent programming language for data science. PyTorch compensates the impact of the Python interpreter on performance through an advanced execution engine, but it does this in a way which is fully transparent to the user, both during development and during debugging. PyTorch also features a seamless interoperation with NumPy. On the CPU, NumPy arrays and Torch tensors can even share the same underlying memory and be converted back and forth at no cost.

An important aspect of a deep learning model is to be able to be deployed in production on a number of architectures, such from GPU clusters to low footprint devices, even to mobile devices. PyTorch can be deployed on clusters thanks to its distributed computing capabilities, but it isn’t designed to be deployed on a phone.  Computation graphs can be exported to a neural network interoperability representation, namely the Open Neural Network Exchange (ONNX, github.com/onnx/onnx). This allows a model defined and trained with PyTorch to be deployed to ONNX-compliant frameworks optimized for inference, like Caffe2 (caffe2.ai), which runs on iOS and Android as well as a host of other architectures, provided that the model satisfies a few basic requirements.

These and many other advantages make PyTorch one of the most interesting deep learning frameworks available, and possibly one of the leading tools for deep learning in the near future.

Before we finally set out for our journey with PyTorch, we’ll spend the last section of this article mapping out its general structure, in terms of components and how they interoperate. This mental map helps us understand what happens and where it’s happening when we run our first lines of PyTorch.

The Anatomy of PyTorch

We’ve already hinted to a few components in PyTorch. Let’s now take some time to formalize a high-level map of the main architectural components. This article mostly deals with the top-most, user-facing layer—the Python module.


Figure 1. Anatomy of PyTorch, showing a high-level Python API (top), the C++ autograd/JIT engine (mid), and the C/CUDA low-level libraries (bottom). Each level is exposed to the upper levels through automatic wrapping. The result is a loosely-coupled system, with stateless low-level building blocks, a high performance engine, and an expressive high-level API.


We mentioned that at the top-most level PyTorch is a Python library. It exposes a convenient API for dealing with tensors and performing operations over them, as well as building neural networks and training them via optimizers. In Torch tradition, the Python layer is thin: it’s designed to prescribe computations, but not to compose them or execute them. This is delegated to lower layers for performance reasons.

Right under the Python layer we find an execution engine written in C++. The engine includes autograd, which manages the dynamic computation graph and provides automatic differentiation, and a jit (just-in-time) compiler that traces computation steps as they’re performed and optimizes them for performance for repeated executions. It’s worth mentioning that many of the features that make PyTorch unique, such as fast automatic differentiation, come from this layer.

At the lowest layers, we find all the core libraries doing the computing. A series of plain C libraries provide efficient data structures, the tensors (a.k.a. multi-dimensional arrays), for CPU and GPU (TH and THC, respectively), as well as stateless functions that implement neural network operations and kernels (THNN and THCUNN) or wrap optimized libraries such as NVIDIA’s cuDNN. Other libraries deal with distributed (multi-machine) and sparse (multi-dimensional arrays where most of the entries are zero) tensor implementations. A lot of the code in this layer comes from Torch7 and Torch5 before it.

A library named ATen automatically wraps the low-level C functions in a convenient C++ API. ATen provides its tensor classes to the engine and it’s automatically wrapped and exposed to Python. Similarly, the neural network function libraries are automatically wrapped towards the engine and Python API. Such automatic wrapping of low-level code contributes to keeping the code loosely coupled, decreasing the overall complexity of the system and encouraging further development.

Despite such layered structure, the Python API is all a practitioner needs to use PyTorch proficiently. Still, awareness on the anatomy of the whole system helps us understand API design and error messages to a greater extent.

Wrapping up

In this article we introduced where the world stands with deep learning and what tool one can use to be part of the revolution. We took a peek on what PyTorch has to offer and why it’s worth investing time and energy into it. Prior to that, we looked at its origins, with the intent of explaining the underlying motivations and design decisions behind Torch first and PyTorch now. Last, we described what PyTorch looks like from a bird’s-eye view.

As with any good story, wouldn’t it be great to take a peek at the amazing things PyTorch enables us to do once we’ve completed our journey?  Stay tuned for part two where you will be able to see how we can use a pre-trained model for image classification in PyTorch.

For more information about the book, read the first chapter for free here.