From Machine Learning for Business by Doug Hudgeon and Richard Nichol

This article covers basic time-series forecasting: what it is and why it’s a tough problem.


Save 37% on Machine Learning for Business. Just enter code fcchudgeon into the discount code box at checkout at manning.com.


 

Forecasting your company’s monthly power usage

Kiara works for a retail chain which has forty-eight locations around the country. She’s an engineer and every month her boss asks her how much energy they’ll consume in the next month. Kiara follows the procedure taught to her by the previous engineer in her role: She looks at how much energy they consumed in the same month last year, weights it by the number of locations they gained or lost, and provides that number to her boss. Her boss then sends this estimate to the facilities management teams to help plan their activities and to Finance to forecast expenditure. The problem is that Kiara’s estimates are always wrong, sometimes by a lot.

As an engineer, she reckons there must be a better way to approach this problem. In this article, you’ll use SageMaker to help Kiara produce better estimates of her company’s upcoming power consumption. When the estimates are correct, they’ll be able to figure out if they are able to decrease the cost of their energy bills by comparing prices when looking at Business Energy quotes.

What are you making decisions about?

We’re going to use a neural network to predict how much power Kiara’s company will use next month. Neural networks are much more difficult to intuitively understand than machine learning algorithms. Rather than attempt to give you a deep understanding of neural networks, in this article, we’ll focus on how to explain the output from the neural network. Instead of a theoretical discussion of neural networks, you’ll come out of this article knowing how to use a neural network to forecast time series events and how to explain the results of the forecast. Rather than learning in detail the why of neural networks, you’ll learn the how.

For example, figure 1 shows the predicted verses actual power consumption for one of Kiara’s sites for a six-week period from mid-October 2018 to the end of November. The site follows a weekly pattern with high usage on the weekdays and dropping low on Sunday.

The shaded blue area shows the range Kiara predicted with 80% accuracy. When Kiara calculates the average error for her prediction, she discovers it is 5.7%, which means that for any predicted amount, it is more likely to be within 5.7% of the predicted figure than not. Using SageMaker, you can do all of this without understanding in-depth how the neural network functions. In our view, this is OK.


Figure 1. Chart showing predicted verses actual power consumption for November 2018 for a site


To understand how neural networks can be used for time series forecasting, you need to understand why time-series forecasting is a thorny problem. Once you understand this, you’ll see what a neural network is and how a neural network can be applied to time-series forecasting. Then you’ll roll up your sleeves, fire up SageMaker, and see it in retailaction on real data.

The power consumption data you’ll use in this article is provided by BidEnergy (www.bidenergy.com), a company that specializes in power-usage forecasting and in minimizing power expenditure. The algorithms used by BidEnergy are more sophisticated than you’ll see in this article but you’ll get a feel for how machine learning in general and neural networks in particular can be applied to forecasting problems.

Introduction to time series data

Time-series data consists of a number of observations at particular interval. For example, if

you created a time series of your weight, you could record your weight on the first of every month for a year. Your time series would have twelve observations with a numerical value for each observation. Table 1 shows what this might look like.

Table 1. table showing a person’s weight over the past year

Date Weight
2018-01-01 75
2018-02-01 73
2018-03-01 72
2018-04-01 71
2018-05-01 72
2018-06-01 71
2018-07-01 70
2018-08-01 73
2018-09-01 70
2018-10-01 69
2018-11-01 72
2018-12-01 74

It’s pretty boring to look at a table of data. It’s hard to get a real understanding of the data when it’s presented in a table format. Line charts are the best way to view time series data.

Figure 2 shows the same data presented as a chart.


Figure 2. Chart showing time series data of my weight over the past year


You can see from this time series that the date is on the left and your weight is on the right. if you wanted to record time series of body weight for your entire family you’d add a column for each of your family members. In table 2 you can see your weight and the weight of each of your family members over the course of a year.

Table 2. Table showing the weight of members of a family over a year

Date Me Spouse Child 1 Child 2
2018-01-01 75 52 38 67
2018-02-01 73 52 39 68
2018-03-01 72 53 40 65
2018-04-01 71 53 41 63
2018-05-01 72 54 42 64
2018-06-01 71 54 42 65
2018-07-01 70 55 42 65
2018-08-01 73 55 43 66
2018-09-01 70 56 44 65
2018-10-01 69 57 45 66
2018-11-01 72 57 46 66
2018-12-01 74 57 46 66

And, once you have that, you can visualize the data as four separate charts as shown in

figure 3.


Figure 3. Chart showing time series data of the weight of members of a family over the past year


Kiara’s time series data: daily power consumption

Power consumption data is displayed in a similar manner. Kiara’s company has forty-eight different business sites (retail, industrial, and transport), and each site gets its own column. Each observation is cell in that column. Table 3 shows a sample of the electricity data used in this article.

This data looks similar to the data in figure 3 showing the weight of each family member each month. The difference is that instead of each column representing a family member, in Kiara’s data each column represents a site (office/warehouse location) for her company. And instead of each row representing a person’s weight on the first day of the month, each row of Kiara’s data shows how much power each site used on that day.

Table 3. Power usage data sample (half-hour intervals)

Time Site_1 Site_2 Site_3 Site_4 Site_5 Site_6
2017-11-01 00:00:00 13.30 13.3 11.68 13.02 0.0 102.9
2017-11-01 00:30:00 11.75 11.9 12.63 13.36 0.0 122.1
Time Site_1 Site_2 Site_3 Site_4 Site_5 Site_6
2017-11-01 01:00:00 12.58 11.4 11.86 13.04 0.0 110.3

Now that you see how time-series data can be represented and visualized, the next step is to see how to use a Jupyter notebook to visualize this data.

That, however, is where we will stop for this article. If you want to learn more about the book, check it out on liveBook here and see this slide deck.