|By François Chollet
This article introduces Keras, a deep learning library for Python that can be used with Theano and TensorFlow to build almost any sort of deep learning model.
Introduction to Keras
In this article, our code examples use Keras. Keras is a deep learning framework for Python which provides a convenient way to define and train almost any kind of deep learning model. Keras was initially developed for researchers, aiming at enabling fast experimentation.
Keras has the following key features:
- It allows the same code to run on CPU or on GPU, seamlessly.
- It has a user-friendly API which makes it easy to quickly prototype deep learning models.
- It has built-in support for convolutional networks (for computer vision), recurrent networks (for sequence processing), and any combination of both.
- It supports arbitrary network architectures: multi-input or multi-output models, layer sharing, model sharing, etc. This means that Keras is appropriate for building any deep learning model, from a memory network to a neural Turing machine.
Keras is distributed under the permissive MIT license, which means it can be freely used in commercial projects. It’s compatible with any version of Python from 2.7 to 3.5. Its documentation is available at keras.io.
Keras has over 50,000 users, ranging from academic researchers and engineers, at both start-ups and large companies, to graduate students and hobbyists. Keras is used at Google, Netflix, Yelp, CERN, at dozens of start-ups working on a wide range of problems (even a self-driving start-up: Comma.ai).
Keras, TensorFlow, and Theano
Keras is a model-level library, providing high-level building blocks for developing deep learning models. It doesn’t handle low-level operations such as tensor manipulation and differentiation. Instead, it relies on a specialized, well-optimized tensor library to do that, serving as the “backend engine” of Keras. Rather than picking a single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras. Currently, the two existing backend implementations are the TensorFlow backend and the Theano backend. In the future, it’s possible that Keras will be extended to work with even more engines, if new ones come out that offer advantages over TensorFlow and Theano.
TensorFlow and Theano are two of the fundamental platforms for deep learning today. Theano is developed by the MILA lab at Universite de Montreal, and TensorFlow is developed by Google. Any piece of code written with Keras can be run with TensorFlow or with Theano without having to change anything: you can seamlessly switch between the two during development, which often proves useful, for instance if one of the two engines proves to be faster for a specific task. Via TensorFlow (or Theano), Keras is able to run on both CPU and GPU seamlessly.
When running on CPU, TensorFlow is wrapping a low-level library for tensor operations called Eigen. On GPU, TensorFlow wraps a library of well-optimized deep learning operations called cuDNN, developed by NVIDIA.
Developing with Keras: a quick overview
The typical Keras workflow looks like our example:
- Define your training data: input tensors and target tensors.
- Define a network of layers (a “model”) that map your inputs to your targets.
- Configure the learning process by picking a loss function, an optimizer, and some metrics to monitor.
- Iterate on your training data.
You can define a model two ways: using the
Sequential class (only for linear stacks of layers, which is the most common network architecture by far), and the “functional API” (for directed acyclic graphs of layers, allowing to build completely arbitrary architectures).
As an example, here’s a two-layer model defined using the
Sequential class (note that we’re passing the expected shape of the input data to the first layer):
from keras.models import Sequential from keras.layers import Dense model = Sequential() model.add(Dense(32, activation='relu', input_shape=(784,))) model.add(Dense(10, activation='softmax'))
And here’s the same model defined using the functional API. With this API, you’re manipulating the data tensor that the model processes, and applying layers to this tensor as if they were functions. A detailed guide to what you can with the functional API can be found in the book itself.
from keras.models import Model from keras.layers import Dense, Input input_tensor = Input(shape=(784,)) x = Dense(32, activation='relu')(input_tensor) output_tensor = Dense(10, activation='softmax')(x) model = Model(input=input_tensor, output=output_tensor)
Once your model architecture is defined, it doesn’t matter whether you used a Sequential model or the functional API: all steps are the same. The learning process is configured at the “compilation” step, where you specify the optimizer and loss function(s), which the model should use, as well as the metrics you want to monitor during training. Here’s an example with a single loss function, by far the most common case:
from keras.optimizers import RMSprop model.compile(optimizer=RMSprop(lr=0.001), loss='mse', metrics=['accuracy'])
Lastly, the learning process itself consists in passing Numpy arrays of input data (and the corresponding target data) to the model via the fit() method, similar to what you’d do in Scikit-Learn or several other machine learning libraries:
model.fit(input_tensor, target_tensor, batch_size=128, nb_epochs=10)