Abstract
Recall that within transitive closure, a graph representation can be created by adding each directed path between every pair of nodes within the graph. A more direct way is to simply iterate the process of concatenating two edges (either already in the graph or inserted earlier) by iterating all the possible combinations of a start node, end node, and a middle node at which the two edges connect. In other words, we look for a combination of edges and and insert a new edge whenever we find such a pair. This is Warshall’s transitive closure algorithm.1
Pseudocode
Algorithm Warshall
input: graph G
output: transitive closure of G
FOR EACH middle in G.nodes
FOR EACH start in G.nodes
FOR EACH end in G.nodes
IF NOT edgeExists(G, start, end) THEN
IF edgeExists(G, start, mid) AND
edgeExists(G, mid, end) THEN
addEdge(G, start, end)
END IF
END IF
END FOR EACH
END FOR EACH
END FOR EACH
RETURN G
Edgy
Python
def transitive_closure(graph:nx.DiGraph):
reachable = {}
nodes = list(graph.nodes)
# initalise 2 key dict with unreachable 0
for node1 in nodes:
reachable[node1]={}
for node2 in nodes:
reachable[node1][node2] = 0
# set nodes to them selves as reachable
for node in nodes:
reachable[node][node]=1
# set reachable for every given edge
# diffrence is for digraph it only populates in direction of one
for edge in graph.edges.data():
node1, node2, pdict = edge
reachable[node1][node2]=1
# up to here reachable is an adjacency matric as a dualkey dictionary
# print(reachable)
# important for intermedate node loop to be on outside (?) might be absolutely nessicary for relaxation
for intermediaryNode in nodes:
for nodeI in nodes:
for nodeJ in nodes:
# if unreachable
if reachable[nodeI][nodeJ] != 1:
if reachable[nodeI][intermediaryNode] and reachable[intermediaryNode][nodeJ]:
# sees if using intermediaryNode between nodeI to nodeJ is reachable
reachable[nodeI][nodeJ] = 1
return reachable
Correctness
We use proof by induction to prove the algorithm’s correctness. Given the basis of the outer loop, we assume that we iterate through the nodes in some ordered arbitrary sequence from .
It is easy to see that the -th iteration of the outer loop establishes all paths that only uses nodes as intermediate nodes.
At the -st iteration all possible paths from to that only use as intermediate nodes have already been established earlier by induction. A new cycle-free path using can only arise if there is an edge from to and one from to that can be joined with being the intermediate. Each of these edges is either an original edge or arose from a path that only uses nodes . By inserting edges from to for all such cases the -th iteration establishes all possible paths that only uses nodes as intermediate nodes. Thus the algorithm correctly generates the transitive closure after iterations out of the outer loop.
Footnotes
-
Alexandria Repo, p134 ↩