PageRank is an algorithm used to rank web pages. PageRank outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. The underlying assumption is that more important websites are likely to receive more links from other websites.

Properties

  • sum of the PageRank value of all web pages is equal to 1

Sink nodes

If the imaginary person reaches a webpage which has no outgoing links (sink nodes), they randomly go to any web page (implemented by adding a directed edge from sink node to all other nodes, with equal probability).

Damping factor

PageRank takes into account that a person clicking links will eventually stop clicking. The probability, at any step, that the person will continue following links is the damping factor . The probability that they instead just to any other random page is . It is generally set around .1

Formula

The formula for the PageRank value of a node is

Where

  • is the total number of nodes in the graph
  • is the set of nodes in the graph that link to
  • is the number of outgoing links of
  • represents the damping factor.

The PageRank values are then repeatedly calculated until the values converge, giving the final PageRank values.

PageRank can also be described as a transition matrix, repeatedly multiplying a column vector where each value is , until the steady-state matrix is found.

Implementation

Footnotes

  1. https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm