|
From Machine Learning Bookcamp by Alexey Grigorev In this series, we cover model deployment: the process of putting models to use. In particular, we’ll see how to package a model inside a web service, allowing other services to use it. We also show how to deploy the web service to a production-ready environment.
|
Take 40% off Machine Learning Bookcamp by entering fccgrigorev into the discount code box at checkout at manning.com.
Churn prediction model
To get started with deployment we’ll use the model that gets used in the book, you can download the source code here (the model and data are in the CH 03 folder and the other relevant code is in the Ch05 folder). First, in this article, we’ll review how we can use the model for making predictions, and then we’ll see how to save it with Pickle.
Using the model
Let’s use this model to calculate the probability of churning for the following customer:
customer = { 'customerid': '8879-zkjof', 'gender': 'female', 'seniorcitizen': 0, 'partner': 'no', 'dependents': 'no', 'tenure': 41, 'phoneservice': 'yes', 'multiplelines': 'no', 'internetservice': 'dsl', 'onlinesecurity': 'yes', 'onlinebackup': 'no', 'deviceprotection': 'yes', 'techsupport': 'yes', 'streamingtv': 'yes', 'streamingmovies': 'yes', 'contract': 'one_year', 'paperlessbilling': 'yes', 'paymentmethod': 'bank_transfer_(automatic)', 'monthlycharges': 79.85, 'totalcharges': 3320.75, }
To predict if this customer is going to churn, we can use a predict
function:
df = pd.DataFrame([customer]) y_pred = predict(df, dv, model) y_pred[0]
This function needs a dataframe, which means we must first create a dataframe with one row — our customer. Next, we put it into the predict
function. The result is a NumPy array with a single element: the predicted probability of churn for this customer:
0.061875
This means that this customer has a six percent probability of churning.
Now let’s take a look at the predict
function. We wrote it previously for applying the model to the customers in the validation set. This is how it looks:
def predict(df, dv, model): cat = df[categorical + numerical].to_dict(orient='rows') X = dv.transform(cat) y_pred = model.predict_proba(X)[:, 1] return y_pred
Using it for one customer seems inefficient and unnecessary: we create a dataframe from a single customer only to convert this dataframe back to a dictionary later inside predict
.
To avoid doing this unnecessary conversion, we can create a separate function for predicting the probability of churn for a single customer only. Let’s call this function predict_single
:
def predict_single(customer, dv, model): #A X = dv.transform([customer]) #B y_pred = model.predict_proba(X)[:, 1] #C return y_pred[0] #D
#A Instead of passing a dataframe, pass a single customer
#B Vectorize the customer: create the matrix X
#C Apply the model to this matrix
#D Because we have only one customer, we need only the first element of the result
Using it becomes simpler: we invoke it with our customer (a dictionary):
predict_single(customer, dv, model)
The result is the same: this customer has a six percent probability of churning.
This model lives in the Jupyter notebook, and once we stop it, the trained model disappears. This means that now we can only use it inside the notebook and nowhere else. Next, we’ll see how to address it.
Using Pickle to save and load the model
To be able to use it outside of our notebook, we need to save it — and then later, another process can load it and use it (figure 1).
Figure 1. We train a model in a Jupyter notebook. To use it, we first need to save it and then load in a different process.
Pickle is a serialization/deserialization module which is already built-in in Python: using it we can save an arbitrary Python object (with a few exceptions) to a file. Once we have a file, we can load the model from there in a different process.
NOTE: “Pickle” can also be used as a verb: pickling an object in Python means saving it using the Pickle module.
Saving the model
To save the model, we first import the pickle
module, and then use the dump
function:
import pickle with open('churn-model.bin', 'wb') as f_out: #A pickle.dump(model, f_out) #B
#A Specify the file where we want to save
#B Save the model to file with pickle
To save the model, we use the open
function. It takes two arguments:
- The name of the file that we want to open. For us it’s
churn-model.bin
. - The mode with which we open the file. For us, it’s
wb
, which means we want to write to the file (w
), and the file should be binary (b
).
The open
function returns f_out
— the file descriptor which we can use to write to the file.
Next, we use the dump function from Pickle. It also takes two arguments:
- The object we want to save. For us, it’s
model
- The file descriptor, pointing to the output file, which is
f_out
for us
Finally, we use the with
construction in this code. When we open a file with open, we need to close it after we finish writing. With with
, it happens automatically. Without with
, our code looks like this:
f_out = open('churn-model.bin', 'wb') pickle.dump(model, f_out) f_out.close()
In our case saving the model isn’t enough: we also have a DictVectorizer
that we also “trained” together with the model. We need to save both.
The simplest way of doing it’s to put both of them in a tuple when pickling:
with open('churn-model.bin', 'wb') as f_out: pickle.dump((dv, model), f_out) #A
# The object we save is a tuple with two elements
Loading the model
To load it, we use the load
function from Pickle. We can test it in the same Jupyter notebook:
with open('churn-model.bin', 'rb') as f_in: #A dv, model = pickle.load(f_in) #B
#A Open the file in the read mode
#B Load the tuple and unpack it
We again use the open
function, but this time, with a different mode: rb
, which means we open it for reading (r
), and the file is binary (b
).
WARNING: Be careful when specifying the mode. Accidentally specifying an incorrect mode may result in data loss: if you open an existing file with the w
mode instead of r
, it overwrites the content.
Because we saved a tuple, we unpack it when loading, and we get both the vectorizer and the model at the same time.
WARNING: Unpickling objects found on the internet isn’t secure: when doing it, it can execute arbitrary code on your machine. Only use it for things you trust and things you saved yourself.
Let’s create a simple Python script that loads the model and applies it to a customer.
We call this file churn_serving.py
. It contains:
- The
predict_single
functions that we wrote earlier - The code for loading the model
- The code for applying the model to a customer
First, we start with imports. For this script, we need to import Pickle and NumPy:
import pickle import numpy as np
Next, let’s put the predict_single
function there:
def predict_single(customer, dv, model): X = dv.transform([customer]) y_pred = model.predict_proba(X)[:, 1] return y_pred[0]
Now we can load our model:
with open('churn-model.bin', 'rb') as f_in: dv, model = pickle.load(f_in)
And apply it:
customer = { 'customerid': '8879-zkjof', 'gender': 'female', 'seniorcitizen': 0, 'partner': 'no', 'dependents': 'no', 'tenure': 41, 'phoneservice': 'yes', 'multiplelines': 'no', 'internetservice': 'dsl', 'onlinesecurity': 'yes', 'onlinebackup': 'no', 'deviceprotection': 'yes', 'techsupport': 'yes', 'streamingtv': 'yes', 'streamingmovies': 'yes', 'contract': 'one_year', 'paperlessbilling': 'yes', 'paymentmethod': 'bank_transfer_(automatic)', 'monthlycharges': 79.85, 'totalcharges': 3320.75, } prediction = predict_single(customer, dv, model)
Finally, let’s display the results:
print('prediction: %.3f' % prediction) if prediction >= 0.5: print('verdict: Churn') else: print('verdict: Not churn')
After saving the file, we can run this script with Python:
python churn_serving.py
We should immediately see the results:
prediction: 0.062 verdict: Not churn
This way, we can load the model and apply it to the customer we specified in the script.
We aren’t going to manually put the information about customers in the script. In part 2, we’ll cover a more practical approach: putting the model in a web service.
That’s all for this article.
If you want to learn more about the book, check it out on our browser-based liveBook platform here.