Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Advanced Algorithms Notes for CMU 15-850 Fall 2020, Study notes of Algorithms and Programming

Notes on advanced algorithms for CMU 15-850 Fall 2020. The document covers topics such as minimum spanning trees, arborescences, dynamic algorithms for graph connectivity, shortest paths in graphs, low-stretch spanning trees, graph matchings, concentration of measure, and dimension reduction. The notes include definitions, algorithms, and optional topics. suitable for university students studying computer science or mathematics.

Typology: Study notes

2019/2020

Uploaded on 05/11/2023

country.side
country.side 🇺🇸

4.1

(15)

243 documents

1 / 309

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ADVANCED ALGORITHMS
notes for cmu 15-850 (fall 2020)
lecturer:anupam gupta
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Advanced Algorithms Notes for CMU 15-850 Fall 2020 and more Study notes Algorithms and Programming in PDF only on Docsity!

A D VA N C E D A L G O R I T H M S

notes for cmu 15- 850 (fall 2020)

lecturer: anupam gupta

Contents

Part I

Discrete Algorithms

1

Minimum Spanning Trees

1. 1 Minimum Spanning Trees: History

In minimum spanning tree problem, the input is an undirected con- nected graph G = (V, E) with n nodes and m edges, where the edges have weights w(e) ∈ R. The goal is to find a spanning tree of the graph with the minimum total edge-weight. If the graph G A spanning tree/forest is defined to be an acyclic subgraph T that is inclusion- wise maximal, i.e., adding any edge in G \ T would create a cycle.

is disconnected, we get a spanning forest As a classic (and important) problem, it’s been tackled many times. Here’s a brief, not-quite- comprehensive history of its optimization, all without making any assumptions on the edge weights other that they can be compared in constant time:

  • Otakar Bor ˚uvka 1 gave the first known MST algorithm in 1926 ; 1 it was independently discovered by Gustave Choquet, Georges Sollin, and others. Vojteˇch Jarník 2 gave his algorithm in 1930 , and 2 it was independently discovered by Robert Prim (’ 57 ) and Edsger Dijkstra (’ 59 ), among others. Joseph Kruskal gave his algorithm J.B. Kruskal, Jr. ( 1956 ) in ’ 56 ; this was rediscovered by Loberman and Weinberger in 57. Loberman and Weinberger ( 1957 ) All these can easily be implemented in O(m log n) time; we will discuss these in this lecture. Both Prim and Kruskal refer to Bor ˚uvka’s paper, but say it is “un- necesarily elaborate”. However, while Bor ˚uvka’s paper is written in a compli- cated fashion, but his essential ideas are very clean.
  • In 1975 , Andy Yao 3 achieved a runtime of O(m log log n). His

3

algorithm builds on Bor ˚uvka’s algorithm (which he attributes to Sollin), and uses as a subroutine the linear-time algorithm for median-finding, which had only recently been invented in 1974. We will work through Yao’s algorithm in HW# 1.

  • In 1984 , Michael Fredman and Bob Tarjan gave an O(m log∗^ n) Fredman and Tarjan ( 1987 ) time algorithm, based on their Fibonacci heaps data structure. Here log∗^ is the iterated logarithm function, and denotes the num- ber of times we must take logarithms before the argument be- comes smaller than 1. The actual runtime is a bit more nuanced, which we will not bother with today.

12 minimum spanning trees: history

This result was soon improved by Gabow, Galil, Spencer, and Tarjan (’ 86 ) to get an O(m log log∗^ n) runtime—note the logarithm Gabow, Galil, Spencer, and Tarjan ( 1986 ) applied to the iterated logarithm.

  • In 1995 , David Karger, Phil Klein and Bob Tarjan finally got the Karger, Klein, and Tarjan ( 1995 ) holy grail of O(m) time!... but it was a randomized algorithm, so the search for a deterministic linear-time algorithm continued.
  • In 1997 , Bernard Chazelle gave an O(m α (n))-time deterministic Chazelle ( 1997 ) algorithm. Here α (n) is the inverse Ackermann function (defined in § 1. 6 ). This function grows extremely slowly, even slower than the iterated logarithm function. However, it still goes to infinity as n → ∞, so we still don’t have a deterministic linear-time MST algorithm.
  • In 1998 , Seth Pettie and Vijaya Ramachandran gave an optimal Pettie and Ramachandran ( 1998 ) algorithm for computing minimum spanning trees—however, we don’t know its runtime! More formally, they show that if This was part of Seth’s Ph.D. thesis, and there exists an algorithm which uses MST∗(m, n) comparisons Vijaya was his advisor. to find MSTs on all graphs with m edges and n nodes, the Pettie- Ramachandran algorithm will run in time O(MST∗(m, n)).)

In this chapter, we’ll go through the three classics (Jarnik/Prim’s, Kruskal’s, and Bor ˚uvka’s). Then we will discuss Fredman and Tar- jan’s algorithm, and finally present Karger, Klein, and Tarjan’s ran- domized algorithm. This will lead us to discuss another intriguing question: how do we verify whether a given tree is an MST? For the rest of this chapter, assume that the edge weights are dis- tinct. This does not change things in any essential way, but it ensures that the MST is unique (Exercise: prove this!), and hence simpli- fies some statements. Also assume the graph is simple, and hence m = O(n^2 ); you can delete all self-loops and remove all-but-the- lightest from any collection of parallel edges, all by preprocessing the graph in linear time.

1. 1. 1 The Cut and Cycle Rules

Most of these algorithms rely on two rules: the cut rule (known in Tarjan’s book as the blue rule) and the cycle rule (or the red rule). Tarjan ( 1983 ) Recall that a cut in the graph is a partition of the vertices into two non-empty sets (S, ¯S = V \ S), and an edge crosses this cut if its two endpoints lie in different sets.

Theorem 1. 1 (Cut Rule). For any cut of the graph, the minimum-weight edge that crosses the cut must be in the MST. This rule helps us determine what to add to our MST.

14 the classical algorithms

two vertices which are not currently in the same blue component. Figure 1. 1 gives an example of how edges are added. 1

2

10

3

4

5

Figure 1. 1 : Dashed lines are not yet in the MST. Note that 5 will be analyzed next, but will not be added. 10 will be added. Colors designate connected components.

To keep track of which vertex is in which component, use a dis- joint set union-find data structure. This data structure has three operations:

  • makeset(elem), which takes an element elem and creates a new singleton set for it,
  • find(elem), which finds the canonical representative for the set containing the element elem, and
  • union(elem 1 , elem 2 ), which merges the two sets that elem 1 and elem 2 are in.

There is an implementation of this data structure which allows us to do m operations in O(m α (m)) amortized time, where α (·) is the in- verse Ackermann function mentioned above. Note that the naïve im- plementation of Kruskal’s algorithm spends O(m log m) = O(m log n) time to sort the edges, and then performs n makesets, m finds, and n − 1 union operations, the total runtime is O(m log n + m α (m)), which is dominated by the O(m log n) term.

1. 2. 2 The Jarnik/Prim Algorithm

For the Jarnik/Prim algorithm, first take an arbitrary root vertex r to start our MST T. At each iteration, take the cheapest edge connecting of our current tree T of blue edges to some vertex not yet in T, and color it blue—thereby adding this edge to T and increasing its size by one. Figure 1. 2 below shows an example of how we edges are added.

1

2

10

3

4

5

Figure 1. 2 : Dashed lines are not yet in the MST. We started at the red node, and the blue nodes are also part of T right now.

We’ll use a priority queue data structure which keeps track of the lightest edge connecting T to each vertex not yet in T. A priority queue data structure is equipped with (at least) three operations:

  • insert(elem, key) inserts the given (element, key) pair into the queue,
  • decreasekey(elem, newkey) changes the key of the element elem from its current key to min(originalkey, newkey), and
  • extractmin() removes the element with the minimum key from the priority queue, and returns the (elem, key) pair.

Note that by using the standard binary heap data structure we can get O(log n) worst-case time for each priority queue operation above. To implement the Jarnik/Prim algorithm, we initially insert each vertex in V \ {r} into the priority queue with key ∞, and the root r with key 0. The key of an node v denotes the weight of

minimum spanning trees 15

the least-weight edge from a node in T to v; it is zero if v ∈ T, and ∞ if there are no edges yet from nodes in T to v. At each step, use extractmin to find the vertex u with smallest key, and add u to the tree using this edge. Then for each neighbor of u, say v, do decreasekey(v, w({u, v})). Overall we do m decreasekey operations, We can optimize slightly by inserting a vertex into the priority queue only when it has an edge to the current tree T. This does not seem particularly useful right now, but will be crucial in the Fredman-Tarjan proof.

n inserts, and n extractmins, with the decreasekeys supplying the dominating O(m log n) term.

1. 2. 3 Bor ˚uvka’s Algorithm

Unlike Kruskal’s and Jarnik/Prim’s algorithms, Bor ˚uvka’s algorithm adds many edges in parallel, and can be implemented without any non-trivial data structures. In a “round”, simply take the lightest edge out of each vertex and color it blue; these edges are guaranteed to form a forest if edge-weights are distinct. (Exercise: why?) Now contract the blue edges and recurse on the resulting graph. At the end, when the resulting graph is a single vertex, uncontract all the edges to get the MST. Each round can be implemented in O(m) work: we will work out the details of this in HW # 1. Moreover, we’re guaranteed to shrink away at least half of the nodes (as each node at least pairs up with one other node), and maybe many more if we are lucky. So we have at most ⌈log 2 n⌉ rounds of computation, leaving us with O(m log n) total work.

1

2

10

3

4

5

10

Figure 1. 3 : The red edges will be chosen and contracted in a single step, yielding the graph on the right, which we recurse on. Colors designate components.

1. 2. 4 A Slight Improvement on Jarnik/Prim

We can actually easily improve the performance of Jarnik/Prim’s algorithm by using a more sophisticated data structure, namely by using Fibonacci heaps instead of binary heaps to implement the priority queue. Fibonacci heaps (invented by Fredman and Tarjan) implement the insert and decreasekey operations in constant amor- tized time, and extractmin in amortized O(log H) time, where H is the maximum number of elements in the heap during the execution. Since we do n extractmins, and O(m + n) of the other two opera- tions, and the maximum size of the heap is at most n, this gives us a total cost of O(m + n log n). Note that this is linear time on graphs with m = Ω(n log n) edges; however, we’d like to get linear-time on all graphs. So the remaining cases are the graphs with m = o(n log n) edges.

1. 3 Fredman and Tarjan’s O(m log∗^ n)-time Algorithm

Fredman and Tarjan’s algorithm builds on Jarnik/Prim’s algorithm: the crucial observation uses the following crucial facts.

minimum spanning trees 17

  1. If at any time |N(T)| ≥ K, or if T has just added an edge to some vertex that was previously marked, stop and mark all vertices in the current T, and go to step 1.

  2. Terminate when each node belongs to some tree.

Let’s first note that the runtime of one round of the algorithm is O(m + n log K). Each edge is considered at most twice, once from each endpoint, giving us the O(m) term. Each time we grow the current tree in step 1 , the number of connected components decreases by 1, so there are at most n such steps. Each step calls findmin on a heap of size at most K, which takes O(log K) times. Hence, at the end of this round, we’ve successfully identified a forest, each edge of which is part of the final MST, in O(m + n log K) time. Let dv be the degree of the vertex v in the graph we consider in this round. We claim that every marked vertex u belongs to a com- ponent C such that (^) ∑v∈C dv ≥ K. Indeed, if u became marked be- cause the neighborhood of its component had size at least K, then this is true. Otherwise, u became marked because it entered a com- ponent C of marked vertices. Since the vertices of C were marked, ∑v∈C dv ≥^ K^ before^ u^ joined, and this sum only increased when^ u (and other vertices) joined. Thus, if C 1 ,... , Cl are the components at the end of this routine, we have

2 m = (^) ∑ v

dv =

l ∑ i= 1

∑ v∈Ci

dv ≥

l ∑ i= 1

K ≥ Kl

Thus l ≤ (^2) Km , i.e. this routine produced at most (^2) Km trees. The choice of K will change over the course of the algorithm. How should we set the thresholds Ki? Say we start round i with ni nodes and mi ≤ m edges. One clean way is to set

Ki := 2

(^2) nm i

which ensures that

O(mi + ni log Ki ) = O

mi + ni · 2 m ni

= O(m).

In turn, this means the number of trees, and hence the number of nodes ni+ 1 in the next round, is at most (^2) Kmi i ≤ (^2) Kmi. The number of edges is mi+ 1 ≤ mi ≤ m. Rewriting, this gives

Ki ≤ 2 m ni+ 1 = lg Ki+ 1 =⇒ Ki+ 1 ≥ 2 Ki^.

Hence the threshold value exponentiates in each step. Hence after The threshold increases “tetrationally”. log∗^ n rounds, the value of K would be at least n, and we would

18 a linear-time randomized algorithm

just run Jarnik/Prim’s algorithm to completion, ending with a sin- gle tree. This means we have at most log∗^ n rounds, and a total of O(m log∗^ n) work. In retrospect, I don’t know whether to consider the Fredman- Tarjan algorithm as being trivial (once we have Fibonacci heaps) or being devilishly clever. I think it is the latter (and that is the beauty of the best algorithms). Indeed, there’s a lovely idea—of keeping the neighborhoods small at the beginning when there’s a lot of work to do, but allow them to grow quickly, as the graph collapses. It is quite non-obvious at the start, and obvious in hindsight. And once you see it, you cannot un-see it!

1. 4 A Linear-Time Randomized Algorithm

Another algorithm that is extremely clever but almost obvious in hindsight is the the Karger-Klein-Tarjan randomized MST algorithm, Karger, Klein, and Tarjan ( 1995 ) which runs in O(m + n) expected time. The new idea here is to A version of this algorithm was pro- posed by Karger in 1992 , but he only obtained an O(m + n log n) runtime. The enhancement to linear time was given by Klein and Tarjan at the STOC 1994 conference; the combined paper is cited above.

compute a “rough approximation” to the MST, use that to throw away many edges using the cycle rule, and then recurse on the rest of the graph.

1. 4. 1 Heavy & light edges

The crucial definition is that of edges being heavy and light with respect to some forest F.

Definition 1. 3. Let F be a forest that is a subgraph of G. An edge e ∈ E(G) is F -heavy if e creates a cycle when added to F, and moreover it is the heaviest edge in this cycle. Otherwise, we say edge e is F -light.

Figure 1. 5 : Fix this figure, make it interesting. Every edge in F is F-light, as are the edges on the left, and also those going between the components. The edge on the right is F-heavy.

The next facts follow from the definition:

Fact 1. 4. Edge e is F-light ⇐⇒ e ∈ MST(F ∪ {e}).

Fact 1. 5 (Completeness). If T is an MST of G then edge e ∈ E(G) is T-light if and only if e ∈ T.

Fact 1. 6 (Soundness). For any forest F, the F-light edges contain the MST of the underlying graph G. In other words, any F-heavy edge is also heavy with respect to the MST of the entire graph.

This suggests a clear strategy: pick a forest F from the current edges, and discard all the F-heavy edges. Hopefully the number of edges remaining is small. By Fact 1. 6 these edges contain the MST of G, so repeat the process on them. To make this idea work, we want a forest F with many F-heavy edges. The catch is that a forest has many heavy edges if it has small weight, if there are many off-forest edges forming cycles where they are the heaviest edges. Indeed, one