TODO A PageRank is an algorithm used to rank web pages. PageRank was designed and is used by Google. PageRank outputs the probability that a person randomly clicking on links in web pages will arrive at a particular page, and so the sum of PageRank of all web pages is equal to 1.
Sink nodes
If the imaginary person reaches a webpage which has no outgoing links (sink nodes), they randomly go to any web page (implemented by adding a directed edge from sink node to all other nodes, with equal probability).
Dampening factor
Additionally, PageRank takes into account that a person clicking links will eventually stop clicking. This probability that the person stops clicking, and then goes to another random web page is called the damping factor, represented as .
Formula
The formula for the PageRank value of a node is
Where
- is the total number of nodes in the graph
- is the set of nodes in the graph that link to
- is the number of outgoing links of
The PageRank values are then repeatedly calculated until the values converge, giving the final PageRank values.
For those that have done General Maths, PageRank can be thought of as a transition matrix, repeatedly multiplying a column vector where each value is , until the steady-state matrix is found.
Implementation
Python Implementation:
import networkx as nx
def PageRank(graph: nx.DiGraph, d: float = 0.85, tol: float = 1e-6, max_iter: int = 10000) -> dict:
prev_pagerank = {node: 0 for node in graph.nodes}
pagerank = {node: 1/len(graph.nodes) for node in graph.nodes}
for _ in range(max_iter):
prev_pagerank = pagerank
pagerank = {node: 0 for node in graph.nodes}
for i in graph.nodes:
for j in graph.predecessors(i):
pagerank[i] += prev_pagerank[j]/len([n for n in graph.neighbors(j)])
for j in graph.nodes:
if len([n for n in graph.neighbors(j)]) == 0:
pagerank[i] += prev_pagerank[j]/len(graph.nodes)
pagerank[i] = pagerank[i] * d + (1 - d) / len(graph.nodes)
error = sum([abs(prev_pagerank[i] - pagerank[i]) for i in pagerank.keys()])
if error < tol:
return pagerank
return pagerank
Alternatively:
import networkx as nx
import numpy as np
def PageRank(graph: nx.DiGraph, d: float = 0.85, tol: float = 1e-6, max_iter: int = 10000) -> dict:
transition_matrix = np.zeros((len(graph.nodes), len(graph.nodes)))
for i, node in enumerate(graph.nodes):
if len([i for i in graph.successors(node)]) == 0:
transition_matrix[:, i] = 1/len(graph.nodes)
continue
for j, neighbor in enumerate(graph.nodes):
if neighbor in graph.successors(node):
transition_matrix[j, i] = 1 / len([i for i in graph.successors(node)])
prev_pagerank = np.zeros((len(graph.nodes), len(graph.nodes)))
pagerank = np.matrix([[1 / len(graph.nodes)] for _ in graph.nodes])
for _ in range(max_iter):
prev_pagerank = pagerank
pagerank = transition_matrix @ pagerank * d + (1 - d) / len(graph.nodes)
error = np.linalg.norm(prev_pagerank - pagerank)
if error < tol:
return {node: float(pagerank[i, 0]) for i, node in enumerate(graph.nodes)}
return {node: float(pagerank[i, 0]) for i, node in enumerate(graph.nodes)}