Minimum Spanning Tree of an Undirected Graph

A graph often contains redundancy in that there can be multiple paths between two vertices. This redundancy may be desirable, for example to offer alternative routes in the case of breakdown or overloading of an edge (road, connection, phone line) in a network. However, we often require the cheapest sub-network that connects the vertices of a given graph. This must in fact be an unrooted tree, because there is only one path between any two vertices in a tree; if there is a cycle then at least one edge can be removed. The total cost or weight of a tree is the sum of the weights of the edges in the tree. We assume that the weight of every edge is greater than zero. Given a connected, undirected graph G=<V,E>, the minimum spanning tree problem is to find a tree T=<V,E'> such that E' subset_of E and the cost of T is minimal.

Note that a minimum spanning tree is not necessarily unique. Recall that a tree over |V| vertices contains |V|-1 edges. A tree can be represented by an array of this many edges.

Note on Trees

An unrooted tree is a connected, acyclic, undirected graph. Each leaf vertex has degree one, i.e. is connected to one other vertex. An unrooted tree can be made into a rooted tree: If the unrooted tree is "floppy" and it is "picked up" by a leaf to make a root, the new root has one child, every internal vertex has at least one child, and every (other) leaf has no children. If all internal vertices of the unrooted tree have degree three, then the corresponding rooted tree is a binary tree. If the unrooted tree is picked up by an internal vertex, the new root has three children, every other internal vertex has two children, and every leaf has no children.

Prim's Algorithm

Prim's algorithm for the minimum spanning tree problem follows the strategy of beginning with a small tree, i.e. <{v1},{ }>, and growing it until it includes all vertices in the given graph. Initially the tree contains just an arbitrary starting node v1. At each stage a vertex not yet in the tree but closest to (some vertex that is in) the tree is found. This closest vertex is added to the tree. Finding the vertex allows us to improve our knowledge of the distances of the remaining vertices to the tree. A set `done' contains the vertices in the tree.

--Graph G = <V, E>

done := {v1}   --initial Tree is <{v1},{}>

for vertex i in V-{v1}
   T[i] := E[1,i]     --direct edges (possibly "missing")
end for

loop |V|-1 times
--Inv: {T[v]|v in done} represents a min' spanning Tree
--     of the nodes in done and
--     {T[u]|u not in done} contains the shortest known
--     distances from the (sub-)spanning Tree to
--     such vertices u.

  find closest vertex to (sub-)spanning Tree in V - done

  done +:= {closest}
  add closest & edge T[closest] to (sub-)spanning Tree

  for vertex j in V - done
    T[j] := min(T[j],   --update knowledge on paths,
                E[closest,j])      --perhaps better?
  end for
end loop

--- Prim's Minimum Spanning Tree Algorithm ---

Complexity

Prim's algorithm is very similar to Dijkstra's single-source shortest path algorithm and indeed the given version of the former was created by simple edits to the latter. The principal difference is that the criteria for choosing a new vertex is proximity to the tree rather than to a source. Prim's algorithm also clearly takes O(|V|2) time.

Correctness

Prim's algorithm is correct because at each stage it has built a minimum spanning tree over those vertices in the set `done' which eventually contains all the vertices: (i) The condition is trivially true initially. (ii) Consider the addition of a "closest" vertex `v' by an edge `e' to the partial spanning tree T at some intermediate stage. Vertex v must be in the final minimum spanning tree T' of all G. Suppose v could be connected into T' by some other path, not requiring e, for a smaller total cost. Now add e to T'. This would create a cycle through part of T. The cycle could be broken by deleting an edge out of T into the rest of T' of higher weight than e, because e is chosen as the lowest cost edge out of T to a vertex not in done. Thus T' could not be a minimum spanning tree of G, i.e. a contradiction, so the supposition is false.

Kruskal's Algorithm

Kruskal's algorithm for the minimum spanning tree problem begins with many disjoint spanning trees, a spanning forest. It repeatedly joins two trees together until a spanning tree of the entire given graph remains.

--G = <V,E>

P := {{v1}, ..., {vn}} --partition V into singleton trees
E' := {}

loop |V|-1 times
--Inv: E' contains only edges of a min' span' tree for each S in P &
--     each S in P represents a subtree of a minimum spanning tree of G

   find shortest edge e joining different subsets S1 and S2 in P
   E' += {e}
   P := P - {S1,S2} + {S1 union S2}
end loop

--- Kruskal's Minimum Spanning Tree Algorithm, O(|E|*log(|E|)) time ---

Correctness

Kruskal's algorithm certainly leads to a spanning tree T but it is necessary to prove that T is minimal. The invariant shows that this is so: The invariant is certainly true on the initial iteration. In the body of the loop, two partial minimum spanning trees T1 and T2 are joined by an edge `e'. The vertices in T1 and T2 must be connected somehow in a final minimum spanning tree T'. Suppose they could be connected more cheaply in T' than in T. Add e to T'. This would create a cycle but it could be broken by removing an edge from T to T'-T. This would leave the vertices in T1 and T2 connected but at a lower cost than in T' because e is chosen as the cheapest satisfactory edge. Therefore T1, T2 and e form a minimum spanning subtree which must be part of a minimum spanning tree of G. The invariant really is invariant. The algorithm continues to join subtrees until a single minimum cost tree remains, spanning all vertices.

Complexity

The time complexity of Kruskal's algorithm hinges on finding the shortest edge to join two subtrees and on the joining itself. A priority queue can be used to organise the edges. A heap is a suitable implementation; see heap-sort but remember that we require fast access to the smallest edge. The priority queue takes O(|E|log|E|) time to create and O(log|E|) time to find a shortest edge while maintaining the priority queue. The later is done |V|-1 times. The joining of two subtrees can be done in O(|V|log|V|) total time over all the iterations as seen below. Thus Kruskal's algorithm takes O(|E|log|E|) time overall. This is better than Prim's algorithm for sparse graphs for which |E|<<|V|2.

The forest is represented by a partition of the vertices of the graph. Each partial tree spans a subset of the vertices. An array gives the size and first member of each subset. A second array gives the subset of each member and links the members of a subset together.

When two subsets are joined, the members of one subset can be transferred to the other. It is more efficient to transfer the members of the smaller subset to the larger as there are fewer of them. The smaller subset S is joined to a subset as least as large as itself and the resulting subset is at least twice as large as S. Thus each vertex is transferred at most log|V| times. This operation takes place |V|-1 times. The time taken, over all the join operations, is O(|V|log|V|), so this is not a dominant part of the time complexity of Kruskal's algorithm.

Demonstration

The HTML FORM below allows a random edge-weighted, undirected graph with |V| vertices to be created, manipulated and analysed. The value `pr' is the probability of there being an edge <i,j>; it controls the sparseness of the graph and on average there will be pr*|V|*(|V|-1) edges:

controls:

|V|=[  ], pr=[  ]
e=< ,  >,  wt=[  ]

output:

G=

Note that the edges, (vi,vj), in the tree produced by Prim's algorithm are sorted on vj, because the algorithm works by considering how best to incorporate each vj into the tree, gradually refining this knowledge (the tree starts as <{v1},{ }>). On the other hand, the edges in the tree produced by Kruskal's algorithm are sorted by their weight, because it considers short edges first.

Notes