# How to make PageRank faster (with lots of math and a hint of Python)

TL;DR: Here you can find a single-threaded implementation of the Personalized PageRank that works better on sparse graphs.

At first, there it was PageRank. A brilliant algorithm, that presents a way of measuring the importance of a webpage by assigning a numerical weight to it. Back in the days, to bring your site to the first page of Google Search, you could just add links to your page in portals perceived as "important".

Nowadays the PageRank is no more the only metric used to rank web pages, but the algorithm proved to be useful in every context where we need an effective ranking for graphs of various kinds, from the Twitter WTF (Who To Follow) system to some applications in predictive models for archaeology.

With the rise of explorative programming environments such as Jupyter notebooks, the need of lightweight, single-threaded implementations of well-known algorithms became important for sketching ideas and develop software via iterative approximations, even on resource-constrained devices.

In this post I'd like to talk about a different implementation of the Personalized PageRank algorithm, introduced by Gianna Maria Del Corso, Antonio Gulli and Francesco Romani in this paper.

Note: The following needs a basic knowledge of linear algebra and numerical methods for solving eigenvector problems and linear systems. There will be a bunch of mathemathical explanation, and a final Pyhton implementation.

## The Personalized PageRank

Given a graph $G = (N, A)$ with $n$ nodes (vertices) and $m$ edges, we suppose that the probability of switching from node $i \in N$ to any of its neighbors $j \in \mathcal{O}{i}$ is the quantity defined as follows:

$P{ij} = \begin{cases} \frac{1}{\# \mathcal{O}_i} & \# \mathcal{O}_i > 0 \\ 0 & \# \mathcal{O}_i = 0 \end{cases}$

Note: In this implementation we removed self-loops from the adjacency list.

When $\# \mathcal{O}_i = 0$ , $i$ is called a dangling node, and we denote the position of such nodes in a vector $\mathbf{d} \in \{ 0, 1 \}^{n}$ where:

$d_i = \begin{cases} 1, & \# \mathcal{O}_i = 0 \\ 0, & \# \mathcal{O}_i > 0 \end{cases}$

We suppose to know a personalization vector $\mathbf{v} \in [0, 1]^{n}$ that for each node $i \in N$ stores the probability that $i$ is chosen among the others. For a web search engine it could be computed with cutting edge machine learning models that learn from your search history (sometimes really badly). For another scenario, i.e. archaeology, it could be computed with respect to the cost of starting an excavation in the location. In every other case, we suppose $v_i = \frac{1}{n}$ , and we always impose that $\sum_{i = 1}^{n} v_i = 1$ (in the implementation the values are normalized).

When we have no dangling nodes and no loops in $G$ , the PageRank $z_i$ of $i \in N$ is the sum of the PageRanks of its in-pointing neighbors $j \in \mathcal{I}i$ , weighted with the transition probability that a random surfer switches from $j$ to $i$ :

$z_i = \sum{j \in \mathcal{I}i} P{ij} z_j$

that represents the left eigenvector $\mathbf{z} \in [0, 1]^{n}$ of the matrix $P = [p_{ij}] \in [0, 1]^{n \times n}$ :
$\mathbf{z}^{T} P = \mathbf{z}^{T}$

Since $\mathbf{z}$ is the steady-state distribution of the Markov chain modelled by $P$ , $\mathbf{z}$ is in fact a probability vector, so $\sum_{i = 1}^{n} z_i = 1$ .

We have to take into account two problems arising from this simple model:

• When the random surfer reaches a dangling nodes, the navigation restarts according to the preference vector.
• At each step, the surfer can keep navigating through outgoing nodes with probability $\alpha$ , and restart the navigation from a node chosen via the preference vector with probabilty $1 - \alpha$ . In such a way we prevent the surfer to get into infinite loops.

Denoting $\mathbf{e} = [1, ..., 1]^{T} \in \{ 1 \}^{n}$ , the eigenvector to compute is now:

\begin{aligned} \mathbf{z}^{T} &= \mathbf{z}^{T} [ \alpha (P + \mathbf{d v}^{T}) + (1 - \alpha) \mathbf{e v}^{T} ] \\ &= \alpha \mathbf{z}^{T} (P + \mathbf{d v}^{T}) + (1 - \alpha) \mathbf{z}^{T} \mathbf{e v}^{T} \end{aligned}

We have $\mathbf{z}^{T} \mathbf{e} = \sum_{i = 1}^{n} z_i = 1$ so we can write:
\begin{aligned} \mathbf{z}^{T} &= \alpha \mathbf{z}^{T} P + \alpha \mathbf{z}^{T} \mathbf{d v}^{T} + (1 - \alpha) \mathbf{v}^{T} \\ &= \alpha \mathbf{z}^{T} P + (\alpha \mathbf{z}^{T} \mathbf{d} + 1 - \alpha) \mathbf{v}^{T} \end{aligned}

The canonical way of solving this problem is via the well-known power method:

In mathematics, power iteration is an eigenvalue algorithm: given a diagonalizable matrix , the algorithm will produce a number , which is the greatest eigenvalue of , and a nonzero vector , which is a corresponding eigenvector of , that is, . The algorithm is also known as the Von Mises iteration.

but now we'll do some algebraic manipulations to get this into a sparse linear system.

## Linear system formulation

Let's make a step back from the previous manipulation. Let's write:

\begin{aligned} & \mathbf{z}^{T} = \alpha \mathbf{z}^{T} (P + \mathbf{d v}^{T}) + (1 - \alpha) \mathbf{v}^{T} \\ \iff & \mathbf{z}^{T} - \alpha \mathbf{z}^{T} (P + \mathbf{d v}^{T}) = (1 - \alpha) \mathbf{v}^{T} \\ \iff & \mathbf{z}^{T} [I - \alpha (P + \mathbf{d v}^{T})] = (1 - \alpha) \mathbf{v}^{T} \\ \iff & [I - \alpha (P^{T} + \mathbf{v d}^{T})] \mathbf{z} = (1 - \alpha) \mathbf{v} \end{aligned}

This is a dense linear system. This means that when the graph is really big, the coefficient matrix is full of values and a solution needs a lot of work to be computed. However, since when the graph is sparse, so $P^{T}$ is sparse, the density is caused by the $\mathbf{v d}^{T}$ component. What if we find a way to not store this matrix at all?

Let $R = I - \alpha P^{T}$ be the sparse coefficient matrix of this new linear system. We note that $[I - \alpha (P^{T} + \mathbf{v d}^{T})] = R - \alpha \mathbf{v d}^{T}$ , so we have:

\begin{aligned} & [I - \alpha (P^{T} + \mathbf{v d}^{T})] \mathbf{z} = (1 - \alpha) \mathbf{v} \\ \iff & (R - \alpha \mathbf{v d}^{T}) \mathbf{z} = (1 - \alpha) \mathbf{v} \\ \iff & \mathbf{z} = (1 - \alpha) (R - \alpha \mathbf{v d}^{T})^{-1} \mathbf{v} \end{aligned}

Let's go all in with linear algebra and put in the game the Sherman-Morrison formula:

$(R - \alpha \mathbf{v d}^{T})^{-1} = R^{-1} + \frac{R^{-1}\mathbf{v d}^{T} R^{-1}}{\frac{1}{\alpha} + \mathbf{d}^{T} R^{-1} \mathbf{v}}$

Don't worry, we climbed the peak of matemathical abstrusity, now the equations will be easier:

\begin{aligned} \mathbf{z} &= (1 - \alpha) (R - \alpha \mathbf{v d}^{T})^{-1} \mathbf{v} \\ &= (1 - \alpha) \left ( R^{-1} + \frac{R^{-1}\mathbf{v d}^{T} R^{-1}}{\frac{1}{\alpha} + \mathbf{d}^{T} R^{-1} \mathbf{v}} \right ) \mathbf{v} \\ &= (1 - \alpha) \left ( 1 + \frac{\mathbf{d}^{T} R^{-1} \mathbf{v}}{\frac{1}{\alpha} + \mathbf{d}^{T} R^{-1} \mathbf{v}} \right ) R^{-1} \mathbf{v} \end{aligned}

Now let $\mathbf{y}$ be the solution of $R \mathbf{y} = \mathbf{v} \iff \mathbf{y} = R^{-1} \mathbf{v}$ , so we can write:

$\mathbf{z} = (1 - \alpha) \left ( 1 + \frac{\mathbf{d}^{T} \mathbf{y}}{\frac{1}{\alpha} + \mathbf{d}^{T} \mathbf{y}} \right ) \mathbf{y} = \frac{\mathbf{y}}{\left \lVert \mathbf{y} \right \rVert_1}$

The computation is reduced to:

$\begin{cases} R \mathbf{y} = (I - \alpha P^{T}) \mathbf{y} = \mathbf{v} \\ \mathbf{z} = \frac{\mathbf{y}}{\left \lVert \mathbf{y} \right \rVert_1} \end{cases}$

We don't even need to compute the dangling nodes!

## The Gauss - Seidel method

Let's make a step further and see how we don't even need to explicitly store $P$ but we can use $G$ itself.

To compute $R \mathbf{y} = \mathbf{v}$ we can use the Gauss - Seidel iterative stationary method:

In numerical linear algebra, the Gauss–Seidel method, also known as the Liebmann method or the method of successive displacement, is an iterative method used to solve a system of linear equations. It is named after the German mathematicians Carl Friedrich Gauss and Philipp Ludwig von Seidel, and is similar to the Jacobi method. Though it can be applied to any matrix with non-zero elements on the diagonals, convergence is only guaranteed if the matrix is either strictly diagonally dominant, or symmetric and positive definite. It was only mentioned in a private letter from Gauss to his student Gerling in 1823. A publication was not delivered before 1874 by Seidel.

Starting from a vector $\mathbf{y}^{(0)}$ - in the implementation $\mathbf{y}^{(0)} = [\frac{1}{n}, ..., \frac{1}{n}]^{T}$ - at each iteration $k \ge 0$ we compute every value $y_i^{(k + 1)}$ as:

\begin{aligned} y_i^{(k + 1)} &= \frac{v_i - \sum_{j = 1}^{i - 1} R_{ij} y_j^{(k + 1)} - \sum_{j = i + 1}^{n} R_{ij} y_j^{(k)}}{R_{ii}} \\ &= \frac{v_i - \sum_{j = 1}^{i - 1} (-\alpha P_{ji}) y_j^{(k + 1)} - \sum_{j = i + 1}^{n} (-\alpha P_{ji}) y_j^{(k)}}{1 - P_{ii}} \\ &= \frac{v_i + \alpha \left [ \sum_{j = 1}^{i - 1} P_{ji} y_j^{(k + 1)} + \sum_{j = i + 1}^{n} P_{ji} y_j^{(k)} \right ]}{1 - P_{ii}} \\ \end{aligned}

Let's call $\mathcal{I}i (a, b) = \{ j \in \mathcal{I}_i | a \le j \le b \}$ . Noting that the nodes that have no in-linking to $i$ do not contribute to the sum because of the all zero coefficients and that the diagonal of the matrix is $R{ii} = 1 - P_{ii} = 1$ (we removed self-loops), we can write:

\begin{aligned} y_i^{(k + 1)} &= \frac{v_i + \alpha \left [ \sum_{j = 1}^{i - 1} P_{ji} y_j^{(k + 1)} + \sum_{j = i + 1}^{n} P_{ji} y_j^{(k)} \right ]}{1 - P_{ii}} \\ &= v_i + \alpha \left [ \sum_{j \in \mathcal{I}i (1, i - 1)} P{ji} y_j^{(k + 1)} + \sum_{j \in \mathcal{I}i (i + 1, n)} P{ji} y_j^{(k)} \right ] \\ &= v_i + \alpha \left [ \sum_{j \in \mathcal{I}i (1, i - 1)} \frac{y_j^{(k + 1)}}{\# \mathcal{O}_j} + \sum{j \in \mathcal{I}_i (i + 1, n)} \frac{y_j^{(k)}}{\# \mathcal{O}_j} \right ] \end{aligned}

## The implementation (finally!)

The main advantage of the Gauss - Seidel method is the low memory footprint: you need to store just one $\mathbf{y} \in \mathbb{R}^{n}$ . In Python, supposing that a graph G is stored as a Dict[int, List[int]], we can write the following, very simple implementation:

This code is just a sketch, and can be improved in various ways. Just let me know if this got your interest.

Sorry for the long post, see you next time!

Posted on by:

### Giacomo Mariani

👨‍💻 Big data for little humans