From Data Science Bookcamp by Leonard Apeltsin

This article discusses how to model a traffic simulation and compute travel probability.

Take 40% off Data Science Bookcamp by entering fccapeltsin into the discount code box at checkout at

Our traffic simulation can be modeled mathematically, using matrices and vectors. We’ll break down this process into simple, manageable parts. Consider for instance, a car that is about to leave Town 0 for one of the neighboring towns. There are neighboring towns to choose from. Hence, the probability of traveling from Town 0 to any neighboring town is 1 / We’ll compute this probability below.

Listing 1. Computing the probability of travel to a neighboring town

 num_neighbors =
 prob_travel = 1 / num_neighbors
 print("The probability of traveling from Town 0 to one of its "
       f"{} neighboring towns is {prob_travel}")

The probability of traveling from Town 0 to one of its 5 neighboring towns is 0.2

If we’re in Town 0 and Town i is a neighboring town, then there’s a 20% chance of us traveling from Town 0 to Town i. Of course, if Town i is not a neighboring town, then the probability drops to zero. We can track the probabilities for every possible i using a vector v. The value of v[i] will equal 0.2 if i is in G[0], and 0 otherwise. Vector v is called a transition vector, since it tracks the probability of transitioning from Town 0 to other towns. There are multiple ways to compute the transition vector. We could:

  1. Run np.array([0.2 if i in G[0] else 0 for i in G.nodes])
  • Each ith element will equal 0.2 if i is in G[0], and 0 otherwise.
  1. Run np.array([1 if i in G[0] else 0 for i in G.nodes]) * 0.2.
  • Here, we are simply multiplying 0.2 by the binary vector that tracks the presence or absence of edges linking to G[0].
  1. Run M[:,0] * 0.2, where M is the adjacency matrix.
  • Each adjacency matrix column tracks the binary presence or absence of edges between nodes. Hence, column 0 of M will equal the array in the previous example.

The third computation is the simplest. Of course, 0.2 is equal to 1 / As we discussed in the beginning of this section, the degree can also be computed by summing across an adjacency matrix column. Thus, we can also compute the transitional vector by running M[:,0] / M[:,0].sum().

Let’s compute the transition vector using all the listed methodologies.

Currently, an adjacency matrix M is stored with an adjacency_matrix variable. However, that matrix does not take into account the deleted edge between Town 3 and Town 9. Hence, we’ll recompute the matrix by running adjacency_matrix = nx.to_numpy_array(G).

Listing 2. Computing a transition vector

 transition_vector = np.array([0.2 if i in G[0] else 0 for i in G.nodes])
 adjacency_matrix = nx.to_numpy_array(G) (1)
 v2 = np.array([1 if i in G[0] else 0 for i in G.nodes]) * 0.2
 v3 = adjacency_matrix[:,0] * 0.2
 v4 = adjacency_matrix[:,0] / adjacency_matrix[:,0].sum() (2)
 for v in [v2, v3, v4]:
     assert np.array_equal(transition_vector, v) (3)

We recompute the adjacency matrix to take into account our earlier edge deletion.

Here, we compute the transition vector directly from the adjacency matrix column.

Here, we compute the transition vector directly from the adjacency matrix column.

 [0.  0.  0.  0.2 0.2 0.  0.2 0.2 0.  0.  0.  0.  0.  0.2 0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. ]

We can compute the transition vector for any Town i by running M[:,i] / M[:,i].sum() where M is the adjacency matrix. Furthermore, we can compute these vectors all at once by running M / M.sum(axis=0). The operation divides each column of the adjacency matrix by the associated degree. The end-result is a matrix whose columns correspond to transition vectors. This matrix is referred to as a transition matrix. The matrix is also commonly called the Markov matrix, named after Andrey Markov, a Russian mathematician who studied random processes. We’ll now compute the transition matrix. Based on our expectation, our output’s column zero should equal Town 0’s transition_vector.

Listing 3. Computing a transition matrix

 transition_matrix = adjacency_matrix / adjacency_matrix.sum(axis=0)
 assert np.array_equal(transition_vector, transition_matrix[:,0])

Figure 1. If M is the adjacency matrix, then M / M.sum(axis=0) equals the transition matrix, even if the adjacencies are directed. This figure shows a directed graph. Edges are marked with the transition probabilities. These probabilities are also displayed within a matrix that’s equal to M / M.sum(axis=0). Each column in the matrix is a transition vector, whose probabilities sum to 1.0. According to matrix, the probability of travel from A to C is 1/2 while the probability of travel from C to A is 1/3.

Our transition matrix has a fascinating property. It allows us to compute the traveling probability to every town in just a few lines of code! If we want to know the probability of winding up in Town i after 10 stops, then we simply need to:

  1. Initialize a vector v where v equals np.ones(31) / 31.
  2. Update v to equal transition_matrix @ v over 10 iterations.
  3. Return v[i].

Later, we’ll derive this amazing property from scratch. For now, let’s prove our claims by computing the travel probabilities to Towns 12 and 3 using matrix multiplication. We expect these probabilities to equal 0.051 and 0.047, based on our previous observations.

Listing 4. Computing travel probabilities using the transition matrix

 v = np.ones(31) / 31
 for _ in range(10):
     v = transition_matrix @ v
 for i in [12, 3]:
     print(f"The probability of winding up in Town {i} is {v[i]:.3f}.")

Our expectations are confirmed. We can model traffic-flow using a series of matrix multiplications. These multiplications serve as the basis for PageRank centrality, which is the most profitable node-importance measure in history. PageRank centrality was invented by the founders of Google. They used it to rank web-pages by modeling a user’s online journey as a series of random clicks through the internet’s graph. These page-clicks are analogous to a car that drives through randomly chosen towns. More popular web-pages have a higher likelihood of visits. This insight allowed Google to uncover relevant websites in a purely automated manner. Google was thus able to outperform its competition, and become a trillion-dollar company. Sometimes, data science can pay-off nicely.

PageRank centrality is easy to compute, but not so easy to derive. Nonetheless, with basic probability theory, we can demonstrate why repeated transition_matrix multiplications directly yield the travel probabilities.

Computing PageRank Centrality Using NetworkX

A function to compute PageRank centrality is included in NetworkX. Calling nx.pagerank(G) will return a dictionary mapping between the node ids and their centrality values. Let’s print the PageRank centrality of Town 12. Will it equal 0.051?

Listing 5. Computing PageRank centrality using NetworkX

 centrality = nx.pagerank(G)[12]
 print(f"The PageRank centrality of Town 12 is {centrality:.3f}.")
 The PageRank centrality of Town 12 is 0.048.

The printed PageRank value is 0.048, which is slightly lower than expected. The difference is due to a slight tweak that ensures PageRank works on all possible networks. As a reminder, PageRank was initially intended to model random clicks through a web-link graph. A web-link graph has directed edges, which means that certain web-pages might not have any outbound links. Thus, an internet user might get stuck on a dead-end page if they rely on outbound links to traverse the web. To counter this, the PageRank designers assumed that a user would eventually get tired of clicking web-links. That user would then reboot their journey by going to a totally random web-page. In other words, they’d teleport to one of the len(G.nodes) nodes within the internet graph. The PageRank designers programmed teleportation to occur in 15% of transversal instances. Teleportation ensures that a user will never get stranded on a node with no outbound links.

Figure 2. A directed graph containing four nodes. We can freely travel between inter-connected nodes A, B, and C. However, node D has no outbound edges. Sooner or later, a random traversal will take us from C to D. We will then be stuck forever at node D. Teleportation prevents this from happening. In 15% of our traversals, we’ll teleport to a randomly chosen node. Even if we travel to node D, we can still teleport back to nodes A, B, and C.

In our road network example, teleportation is analogous to calling a helicopter service. Imagine that in 15% of our town visits, we get bored with the local area. We then call for a helicopter, which swoops in and takes us to a totally random town. Once in the air, our probability of flying to any town equals 1 / 31. After we land, we rent a car and continue our journey using the existing network of roads. Hence, 15% of the time, we fly from Town i to Town j with a probability of 1 / 31. In the remaining 85% of instances, we’ll drive from Town i to Town j with a probability of transition_matrix[j][i]. Consequently, the actual travel probability equals the weighted mean of transition_matrix[j][i] and 1 / 31. The respective weights are 0.85 and 0.15. We can compute the weighted average using the np.average function. We can also compute that mean directly by running 0.85 * transition_matrix[j][i] + 0.15 / 31.

Taking the weighted mean across all elements of the transition matrix will produce an entirely new transition matrix. Let’s input that new matrix into our compute_stop_likelihoods function. Afterwards, we’ll print Town 12’s new travel probability. We expect that probability to drop from 0.051 to 0.048.

Listing 6. Incorporating randomized teleportation into our model

 new_matrix = 0.85 * transition_matrix + 0.15 / 31 (1)
 stop_10_probabilities = compute_stop_likelihoods(new_matrix, 10)
 prob = stop_10_probabilities[12]
 print(f"The probability of winding up in Town 12 is {prob:.3f}.")

We multiply transition_matrix by 0.85 and subsequently add 0.15 / 31 to every element. See Section Thirteen for a more in-depth discussion of arithmetic operations on 2D NumPy arrays.

The probability of winding up in Town 12 is 0.048.

Our new output is consistent with the NetworkX result. Will that output remain consistent if we raise the number of stops from 10 to 1000? Let’s find out. We’ll input 1000 stops into compute_stop_likelihoods. Afterwards, we’ll check whether Town 12’s PageRank is still equal to 0.048.

Listing 7. Computing the probability after 1000 stops

 prob = compute_stop_likelihoods(new_matrix, 1000)[12]
 print(f"The probability of winding up in Town 12 is {prob:.3f}.")
 The probability of winding up in Town 12 is 0.048.

The centrality is still equal 0.048. Ten iterations were sufficient for convergence to a stable value. Why is this the case? Well, our PageRank computation is nothing more than the repeated multiplication of a matrix and a vector. The elements of the multiplied vector are all values between 0 and 1. Perhaps this sounds familiar. Our PageRank computation is nearly identical to the power iteration algorithm that we presented in Section Fourteen. Power iteration repeatedly takes the product of a matrix and a vector. Eventually, the product converges to an eigenvector of the matrix. As a reminder, the eigenvector v of a matrix M is a special vector where norm(v) == norm(M @ v). Usually, ten iterations are sufficient to achieve convergence. Hence, our PageRank values converge because we’re running power iteration! Furthermore, this proves that our centrality vector is an eigenvector of the transition matrix. Thus, PageRank centrality is inexplicably linked to the beautiful mathematics behind dimensional reduction.

Given any graph G, we compute its PageRank centralities using the following series of steps:

  1. First, we obtain the graph’s adjacency matrix M.
  2. Next, we convert the adjacency matrix into the transition matrix by running M = M / M.sum(axis=0).
  3. Subsequently, we update M to allow for random teleportation. This is done by taking the weighted mean of M and 1 / n, where n equals the number of nodes within the graph.
  • Usually, the weights are set to 0.85 and 0.15. Hence, the weighted mean equals 0.85 * M + 0.15 / n.
  1. Finally, we return the largest (and only) eigenvector of M.
  • We can compute the eigenvector by running v = M @ v across approximately 10 iterations. Initially, vector v is set to np.ones(n) / n.

Markov matrices tie graph theory together with probability theory and matrix theory. Also, they can be leveraged to cluster network data, using a procedure called Markov clustering.

FOR REFERENCE: Common NetworkX Centrality Computations:

G.in_degree(i): Returns the in-degree of node i in a directed graph. Returns the degree of node i in an undirected graph.

nx.pagerank(G): Returns dictionary mapping between node ids and their PageRank centralities.

That’s all for this article.

If you want to learn more about the book, you can check it out on our browser-based liveBook platform here.