From Machine Learning Bookcamp by Alexey Grigorev

In this series, we cover model deployment: the process of putting models to use. In particular, we’ll see how to package a model inside a web service, allowing other services to use it. We also show how to deploy the web service to a production-ready environment.

Take 40% off Machine Learning Bookcamp by entering fccgrigorev into the discount code box at checkout at

Check out part 1 if you missed it.

We already know how to load a trained model in a different process. Now we need to serve this model — make it available for others to use.

In practice, it usually means that a model is deployed as a web service, and other services can communicate with it, ask for predictions and use the results to make their own decisions.

In this article, we’ll see how to do it in Python with Flask — a Python framework for creating web services. First, we’ll take a look at why we need to use a web service for it.

Web services

We already know how to use a model to make a prediction, but this far, we hardcoded the features of a customer as a Python dictionary.

Let’s try to imagine how our model is used in practice.

Suppose we have a service for running marketing campaigns. For each customer, it needs to determine the probability of churn, and if it’s high enough, it sends a promotional email with discounts. This service needs to use our model to decide if it should send a mail or not.

One possible way of achieving it’s to modify the code of the campaign service: load the model and score the customers right in the service. This is a good approach, but the campaign service needs to be in Python and we need to have full control over its code.

Unfortunately, this isn’t always the case: it may be written in some other language, or a different team might be in charge of this project, which means we won’t have the control we need.

The typical solution for this problem is putting a model inside a web service — a small service (a microservice) that takes care only of scoring customers.

We need to create “churn service” — a service in Python that serves the churn model. Given the features of a customer, it responds with the probability of churn for this customer. For each customer, the campaign service asks the churn service for the probability of churn, and if it’s high enough, then we send a promotional email (figure 1).

Figure 1. The churn service takes care of serving the churn prediction model, making it possible for other services to use it

This gives another advantage: separation of concerns. If the model is created by data scientists, then they can take the ownership of the service and maintain it, as the other team takes care of the campaign service.

One of the most popular frameworks for creating web services in Python’s Flask, which we’ll cover next.


The easiest way to implement a web service in Python is to use Flask. It’s quite lightweight, requires little code to get started and hides most of the complexity of dealing with HTTP requests and responses.

Before we put our model inside a web service, let’s cover the basics of using Flask. For that, we’ll create a simple function and make it available as a web service — and after covering the basics, we’ll take care of the model.

Suppose we’ve a simple Python function called .*?:

 def ping():
     return 'PONG'

It doesn’t do much: when invoked, it responds with “PONG”. Let’s use Flask to turn this function into a web service.

Anaconda comes with Flask pre-installed, but if you use a different Python distribution, you’ll need to install it:

 pip install flask

We put this code in a Python file and call it “.*?”.

To be able to use Flask, we first need to import it:

 from flask import Flask

Now we create a Flask app — the central object for registering functions that need to be exposed in the web service. We’ll call our app “test”:

 app = Flask('test')

Next, we need to specify how to reach the function by assigning it to an address, or a route in Flask terms. In our case, we want to use the “/ping” address:

 @app.route('/ping', methods=['GET']) #A
 def ping():
     return 'PONG'

#A Register the /ping route and assign it to the ping function

This code uses decorators — an advanced Python feature that we don’t cover in this book. We don’t need to understand how it works in detail, it’s enough to know that by putting .*? on top of the function definition, we assign the .*? address of the web service to the .*? function.

To run it, we only need one last bit:

 if __name__ == '__main__':, host='', port=9696)

The .*? method of .*? starts the service. We specify three parameters:

  • .*? — restarts our application automatically when there are changes in the code
  • .*? — makes the web service public, otherwise it isn’t possible to reach it when it’s hosted on a remote machine (e.g. in AWS)
  • .*? — the port that we use to access the application

We’re ready to start our service now. Let’s do it:


When we run it, we should see the following:

  * Serving Flask app "test" (lazy loading)
  * Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  * Debug mode: on
  * Running on (Press CTRL+C to quit)
  * Restarting with stat
  * Debugger is active!
  * Debugger PIN: 162-129-136

This means that our Flask app is now running and ready to get requests. To test it, we can use our browser: open it and put “localhost:9696/ping” in the address bar. If you run it on a remote server, you should replace “localhost” with the address of the server. The browser should respond with “PONG” (figure 2).

Figure 2. The easiest way to check if our application works is to use a web browser

Flask logs all the requests it receives, and we should see a line indicating that there was a GET request on the .*? route: - - [02/Apr/2020 21:59:09] "GET /ping HTTP/1.1" 200 –

As we see, Flask is quite simple: with less than ten lines of code, we created a web service.

Next, we’ll see how to adjust our script for churn prediction and also turn it into a web service.

Serving churn model with Flask

We’ve learned a bit of Flask, and now we can come back to our script and convert it to a Flask application.

To score a customer, our model needs to get the features. It means that we need a way of transferring some data from one service (the campaign service) to another (the churn service).

As a data exchange format, web services typically use JSON (“Javascript Object Notation”). It’s similar to the way we define dictionaries in Python:

     "customerid": "8879-zkjof",
     "gender": "female",
     "seniorcitizen": 0,
     "partner": "no",
     "dependents": "no",

To send data, we use POST requests, not GET: POST requests can include the data in the request, but GET can’t.

To make it possible for the campaign service to get predictions from the churn service, we need to create a .*? route that accepts POST requests. The churn service parses JSON data about a customer and respond in JSON as well (figure 3).

Figure 3. To get predictions, we POST the data about a customer in JSON to the .*? route, and get the probability of churn in response 

Now we know what we want to do; let’s start modifying the .*? file.

First, we add a few more imports at the top of the file:

 from flask import Flask, request, jsonify

Although previously we imported only .*?, now we need to import two more things:

  • .*? — to get the content of a POST request
  • .*? — to respond with JSON

Next, create the Flask app. Let’s call it “churn”:

 app = Flask('churn')

Now we need to create a function that:

  • gets the customer data in a request
  • invokes .*? to score the customer
  • responds with the probability of churn in JSON

We’ll call this function .*? and assign it to the .*? route:

 @app.route('/predict', methods=['POST']) #A
 def predict():
     customer = request.get_json() #B
     prediction = predict_single(customer, dv, model) #C
     churn = prediction >= 0.5 #D
     result = { #D
         'churn_probability': float(prediction), #D
         'churn': bool(churn), #D
     } #D
     return jsonify(result) #E

#A Assign the /predict route to the predict function

#B Get the content of the request in JSON

#C Score the customer

#D Prepare the response

#E Convert the response to JSON

To assign the route to the function, we use the .*? decorator, where we also tell Flask to expect POST requests only.

The core content of the .*? function is similar to what we did in the script previously: it takes a customer, passes it to .*?, and does some work with the result.

Finally, let’s add the last two lines for running the Flask app:

 if __name__ == '__main__':, host='', port=9696)

We’re ready to run it:


After running it, we should see a message saying that the app started and now waits for incoming requests:

  * Serving Flask app "churn" (lazy loading)
  * Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  * Debug mode: on
  * Running on (Press CTRL+C to quit)
  * Restarting with stat
  * Debugger is active!

Testing this code is a bit more difficult than previously: this time, we need to use POST requests and include the customer we want to score in the body of the request.

The simplest way of doing it’s to use the requests library in Python. It also comes pre-installed in Anaconda, but if you use a different distribution, you can install it with pip:

 pip install requests

We can open the same Jupyter notebook that we used previously, and test the web service from there.

First, import requests:

 import requests

Now, make a POST request to our service

 url = 'http://localhost:9696/predict' #A
 response =, json=customer) #B
 result = response.json() #C

#A The use where the service lives

#B Send the customer (as JSON) in the POST request 

#C Parse the response as JSON

The results variable contains the response from the churn service:

 {'churn': False, 'churn_probability': 0.061875678218396776}

This is the same information we previously saw in the terminal, but now we got it as a response from a web service.

NOTE:  Tools like Postman ( make it easier to test web services. We don’t cover Postman in this article, but you’re free to give it a try.

If the campaign service used Python, this is exactly how it could communicate with the churn service and decide who should get promotional emails.

With a few lines of code, we created a working web service that runs on our laptop. In the part 3, we’ll see how to manage dependencies in our service and prepare it for deployment.

That’s all for this article.

If you want to learn more about the book, check it out on our browser-based liveBook platform here.