




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Notes on advanced algorithms for CMU 15-850 Fall 2020. The document covers topics such as minimum spanning trees, arborescences, dynamic algorithms for graph connectivity, shortest paths in graphs, low-stretch spanning trees, graph matchings, concentration of measure, and dimension reduction. The notes include definitions, algorithms, and optional topics. suitable for university students studying computer science or mathematics.
Typology: Study notes
1 / 309
This page cannot be seen from the preview
Don't miss anything!
1
Minimum Spanning Trees
In minimum spanning tree problem, the input is an undirected con- nected graph G = (V, E) with n nodes and m edges, where the edges have weights w(e) ∈ R. The goal is to find a spanning tree of the graph with the minimum total edge-weight. If the graph G A spanning tree/forest is defined to be an acyclic subgraph T that is inclusion- wise maximal, i.e., adding any edge in G \ T would create a cycle.
is disconnected, we get a spanning forest As a classic (and important) problem, it’s been tackled many times. Here’s a brief, not-quite- comprehensive history of its optimization, all without making any assumptions on the edge weights other that they can be compared in constant time:
3
algorithm builds on Bor ˚uvka’s algorithm (which he attributes to Sollin), and uses as a subroutine the linear-time algorithm for median-finding, which had only recently been invented in 1974. We will work through Yao’s algorithm in HW# 1.
12 minimum spanning trees: history
This result was soon improved by Gabow, Galil, Spencer, and Tarjan (’ 86 ) to get an O(m log log∗^ n) runtime—note the logarithm Gabow, Galil, Spencer, and Tarjan ( 1986 ) applied to the iterated logarithm.
In this chapter, we’ll go through the three classics (Jarnik/Prim’s, Kruskal’s, and Bor ˚uvka’s). Then we will discuss Fredman and Tar- jan’s algorithm, and finally present Karger, Klein, and Tarjan’s ran- domized algorithm. This will lead us to discuss another intriguing question: how do we verify whether a given tree is an MST? For the rest of this chapter, assume that the edge weights are dis- tinct. This does not change things in any essential way, but it ensures that the MST is unique (Exercise: prove this!), and hence simpli- fies some statements. Also assume the graph is simple, and hence m = O(n^2 ); you can delete all self-loops and remove all-but-the- lightest from any collection of parallel edges, all by preprocessing the graph in linear time.
Most of these algorithms rely on two rules: the cut rule (known in Tarjan’s book as the blue rule) and the cycle rule (or the red rule). Tarjan ( 1983 ) Recall that a cut in the graph is a partition of the vertices into two non-empty sets (S, ¯S = V \ S), and an edge crosses this cut if its two endpoints lie in different sets.
Theorem 1. 1 (Cut Rule). For any cut of the graph, the minimum-weight edge that crosses the cut must be in the MST. This rule helps us determine what to add to our MST.
14 the classical algorithms
two vertices which are not currently in the same blue component. Figure 1. 1 gives an example of how edges are added. 1
2
10
3
4
5
Figure 1. 1 : Dashed lines are not yet in the MST. Note that 5 will be analyzed next, but will not be added. 10 will be added. Colors designate connected components.
To keep track of which vertex is in which component, use a dis- joint set union-find data structure. This data structure has three operations:
There is an implementation of this data structure which allows us to do m operations in O(m α (m)) amortized time, where α (·) is the in- verse Ackermann function mentioned above. Note that the naïve im- plementation of Kruskal’s algorithm spends O(m log m) = O(m log n) time to sort the edges, and then performs n makesets, m finds, and n − 1 union operations, the total runtime is O(m log n + m α (m)), which is dominated by the O(m log n) term.
For the Jarnik/Prim algorithm, first take an arbitrary root vertex r to start our MST T. At each iteration, take the cheapest edge connecting of our current tree T of blue edges to some vertex not yet in T, and color it blue—thereby adding this edge to T and increasing its size by one. Figure 1. 2 below shows an example of how we edges are added.
1
2
10
3
4
5
Figure 1. 2 : Dashed lines are not yet in the MST. We started at the red node, and the blue nodes are also part of T right now.
We’ll use a priority queue data structure which keeps track of the lightest edge connecting T to each vertex not yet in T. A priority queue data structure is equipped with (at least) three operations:
Note that by using the standard binary heap data structure we can get O(log n) worst-case time for each priority queue operation above. To implement the Jarnik/Prim algorithm, we initially insert each vertex in V \ {r} into the priority queue with key ∞, and the root r with key 0. The key of an node v denotes the weight of
minimum spanning trees 15
the least-weight edge from a node in T to v; it is zero if v ∈ T, and ∞ if there are no edges yet from nodes in T to v. At each step, use extractmin to find the vertex u with smallest key, and add u to the tree using this edge. Then for each neighbor of u, say v, do decreasekey(v, w({u, v})). Overall we do m decreasekey operations, We can optimize slightly by inserting a vertex into the priority queue only when it has an edge to the current tree T. This does not seem particularly useful right now, but will be crucial in the Fredman-Tarjan proof.
n inserts, and n extractmins, with the decreasekeys supplying the dominating O(m log n) term.
Unlike Kruskal’s and Jarnik/Prim’s algorithms, Bor ˚uvka’s algorithm adds many edges in parallel, and can be implemented without any non-trivial data structures. In a “round”, simply take the lightest edge out of each vertex and color it blue; these edges are guaranteed to form a forest if edge-weights are distinct. (Exercise: why?) Now contract the blue edges and recurse on the resulting graph. At the end, when the resulting graph is a single vertex, uncontract all the edges to get the MST. Each round can be implemented in O(m) work: we will work out the details of this in HW # 1. Moreover, we’re guaranteed to shrink away at least half of the nodes (as each node at least pairs up with one other node), and maybe many more if we are lucky. So we have at most ⌈log 2 n⌉ rounds of computation, leaving us with O(m log n) total work.
1
2
10
3
4
5
10
Figure 1. 3 : The red edges will be chosen and contracted in a single step, yielding the graph on the right, which we recurse on. Colors designate components.
We can actually easily improve the performance of Jarnik/Prim’s algorithm by using a more sophisticated data structure, namely by using Fibonacci heaps instead of binary heaps to implement the priority queue. Fibonacci heaps (invented by Fredman and Tarjan) implement the insert and decreasekey operations in constant amor- tized time, and extractmin in amortized O(log H) time, where H is the maximum number of elements in the heap during the execution. Since we do n extractmins, and O(m + n) of the other two opera- tions, and the maximum size of the heap is at most n, this gives us a total cost of O(m + n log n). Note that this is linear time on graphs with m = Ω(n log n) edges; however, we’d like to get linear-time on all graphs. So the remaining cases are the graphs with m = o(n log n) edges.
Fredman and Tarjan’s algorithm builds on Jarnik/Prim’s algorithm: the crucial observation uses the following crucial facts.
minimum spanning trees 17
If at any time |N(T)| ≥ K, or if T has just added an edge to some vertex that was previously marked, stop and mark all vertices in the current T, and go to step 1.
Terminate when each node belongs to some tree.
Let’s first note that the runtime of one round of the algorithm is O(m + n log K). Each edge is considered at most twice, once from each endpoint, giving us the O(m) term. Each time we grow the current tree in step 1 , the number of connected components decreases by 1, so there are at most n such steps. Each step calls findmin on a heap of size at most K, which takes O(log K) times. Hence, at the end of this round, we’ve successfully identified a forest, each edge of which is part of the final MST, in O(m + n log K) time. Let dv be the degree of the vertex v in the graph we consider in this round. We claim that every marked vertex u belongs to a com- ponent C such that (^) ∑v∈C dv ≥ K. Indeed, if u became marked be- cause the neighborhood of its component had size at least K, then this is true. Otherwise, u became marked because it entered a com- ponent C of marked vertices. Since the vertices of C were marked, ∑v∈C dv ≥^ K^ before^ u^ joined, and this sum only increased when^ u (and other vertices) joined. Thus, if C 1 ,... , Cl are the components at the end of this routine, we have
2 m = (^) ∑ v
dv =
l ∑ i= 1
∑ v∈Ci
dv ≥
l ∑ i= 1
K ≥ Kl
Thus l ≤ (^2) Km , i.e. this routine produced at most (^2) Km trees. The choice of K will change over the course of the algorithm. How should we set the thresholds Ki? Say we start round i with ni nodes and mi ≤ m edges. One clean way is to set
Ki := 2
(^2) nm i
which ensures that
O(mi + ni log Ki ) = O
mi + ni · 2 m ni
= O(m).
In turn, this means the number of trees, and hence the number of nodes ni+ 1 in the next round, is at most (^2) Kmi i ≤ (^2) Kmi. The number of edges is mi+ 1 ≤ mi ≤ m. Rewriting, this gives
Ki ≤ 2 m ni+ 1 = lg Ki+ 1 =⇒ Ki+ 1 ≥ 2 Ki^.
Hence the threshold value exponentiates in each step. Hence after The threshold increases “tetrationally”. log∗^ n rounds, the value of K would be at least n, and we would
18 a linear-time randomized algorithm
just run Jarnik/Prim’s algorithm to completion, ending with a sin- gle tree. This means we have at most log∗^ n rounds, and a total of O(m log∗^ n) work. In retrospect, I don’t know whether to consider the Fredman- Tarjan algorithm as being trivial (once we have Fibonacci heaps) or being devilishly clever. I think it is the latter (and that is the beauty of the best algorithms). Indeed, there’s a lovely idea—of keeping the neighborhoods small at the beginning when there’s a lot of work to do, but allow them to grow quickly, as the graph collapses. It is quite non-obvious at the start, and obvious in hindsight. And once you see it, you cannot un-see it!
Another algorithm that is extremely clever but almost obvious in hindsight is the the Karger-Klein-Tarjan randomized MST algorithm, Karger, Klein, and Tarjan ( 1995 ) which runs in O(m + n) expected time. The new idea here is to A version of this algorithm was pro- posed by Karger in 1992 , but he only obtained an O(m + n log n) runtime. The enhancement to linear time was given by Klein and Tarjan at the STOC 1994 conference; the combined paper is cited above.
compute a “rough approximation” to the MST, use that to throw away many edges using the cycle rule, and then recurse on the rest of the graph.
The crucial definition is that of edges being heavy and light with respect to some forest F.
Definition 1. 3. Let F be a forest that is a subgraph of G. An edge e ∈ E(G) is F -heavy if e creates a cycle when added to F, and moreover it is the heaviest edge in this cycle. Otherwise, we say edge e is F -light.
Figure 1. 5 : Fix this figure, make it interesting. Every edge in F is F-light, as are the edges on the left, and also those going between the components. The edge on the right is F-heavy.
The next facts follow from the definition:
Fact 1. 4. Edge e is F-light ⇐⇒ e ∈ MST(F ∪ {e}).
Fact 1. 5 (Completeness). If T is an MST of G then edge e ∈ E(G) is T-light if and only if e ∈ T.
Fact 1. 6 (Soundness). For any forest F, the F-light edges contain the MST of the underlying graph G. In other words, any F-heavy edge is also heavy with respect to the MST of the entire graph.
This suggests a clear strategy: pick a forest F from the current edges, and discard all the F-heavy edges. Hopefully the number of edges remaining is small. By Fact 1. 6 these edges contain the MST of G, so repeat the process on them. To make this idea work, we want a forest F with many F-heavy edges. The catch is that a forest has many heavy edges if it has small weight, if there are many off-forest edges forming cycles where they are the heaviest edges. Indeed, one