



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The difference between worst-case and average-case behavior in algorithms, introduces randomized algorithms, and analyzes the Quicksort algorithm's worst-case expected time bounds. It covers basic probabilistic concepts and Linearity of Expectation. The document focuses on the Randomized Quicksort algorithm and its expected time analysis.
What you will learn
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!
In this lecture we begin by discussing the difference between worst-case and average-case behavior, and introduce randomized (probabilistic) algorithms and the notion of worst-case expected time bounds. We make this concrete with a discussion of a randomized version of the Quicksort sorting algorithm, which we prove has worst-case expected running time O(n log n). In the process, we discuss basic probabilistic concepts such as events, random variables, and linearity of expectation.
The last lecture discussed the notions of O, Ω, and Θ bounds, and how to compute them using recurrences. We begin this lecture with a different issue: worst-case versus average case bounds. Note that for comparison-based algorithms like Quicksort and Mergesort, we express running time in terms of the number of comparisons made.
Say I is some input and T (I) is running time of our algorithm on input I. We can then define:
Tworstcase(n) = max inputs I of size n
Taveragecase(n) = avg inputs I of size n
For instance, Mergesort has both worst-case and average-case time Θ(n log n). It doesn’t really depend on the input at all. On the other hand, for some algorithms, the running time depends critically on the input. One example is Quicksort.
Quicksort: Given array of some length n,
15
The Quicksort algorithm given above is not yet fully specified because we have not stated how we will pick the pivot element p. For the first version of the algorithm, let’s always choose the leftmost element.
Basic-Quicksort: Run the Quicksort algorithm as given above, always choosing the leftmost element in the array as the pivot.
What is worst-case running time of Basic-Quicksort? We can see that if the array is already sorted, then in Step 2, all the elements (except p) will go into the GREATER bucket. Furthermore, since the GREATER array is in sorted order,^1 this process will continue recursively, resulting in time Ω(n^2 ). We can also see that the running time is O(n^2 ) on any array of n elements because Step 1 can be executed at most n times, and Step 2 takes at most n steps to perform. Thus, the worst-case running time is Θ(n^2 ).
On the other hand, it turns out (and we will prove) that the average-case running time for Basic- Quicksort (averaging over all different initial orderings of the n elements in the array) is O(n log n). So, Basic-Quicksort has good average case performance but not good worst-case performance.
The fact that this algorithm works well on most inputs may be small consolation if the inputs we are faced with are the bad ones (e.g., if our lists are nearly sorted already). One way we can try to get around this problem is to add randomization into the algorithm itself:
Randomized-Quicksort: Run the Quicksort algorithm as given above, each time picking a ran- dom element in the array as the pivot.
We will prove that for any given array input array I of n elements, the expected time of this algorithm E[T (I)] is O(n log n). This is called a Worst-case Expected-Time bound. Notice that this is better than an average-case bound because we are no longer assuming any special properties of the input. E.g., it could be that in our desired application, the input arrays tend to be mostly sorted or in some special order, and this does not affect our bound because it is a worst-case bound with respect to the input. It is a little peculiar: making the algorithm probabilistic gives us more control over the running time.
To prove these bounds, we first detour into the basics of probabilistic analysis.
Consider rolling two dice and observing the results. There are 36 possible outcomes: it could be that the first die comes up 1 and the second comes up 2, or that the first comes up 2 and the second comes up 1, and so on. Each of these outcomes has probability 1/36 (assuming these are fair dice). Suppose we care about some quantity such as “what is the probability the sum of the dice equals 7?” We can compute that by adding up the probabilities of all the outcomes satisfying this condition (there are six of them, for a total probability of 1/6).
(^1) Technically, this depends on how the partitioning step is implemented, but will be the case for any reasonable implementation.
Theorem 3.1 (Linearity of Expectation) For any two random variables X and Y , E[X +Y ] = E[X] + E[Y ].
Proof (for discrete RVs): This follows directly from the definition as given in (3.1).
E[X + Y ] =
∑
e∈S
Pr(e)(X(e) + Y (e)) =
∑
e∈S
Pr(e)X(e) +
∑
e∈S
Pr(e)Y (e) = E[X] + E[Y ].
Suppose we unwrap a fresh deck of cards and shuffle it until the cards are completely random. How many cards do we expect to be in the same position as they were at the start? To solve this, let’s think formally about what we are asking. We are looking for the expected value of a random variable X denoting the number of cards that end in the same position as they started. We can write X as a sum of random variables Xi, one for each card, where Xi = 1 if the ith card ends in position i and Xi = 0 otherwise. These Xi are easy to analyze: Pr(Xi = 1) = 1/n where n is the number of cards. Pr(xi = 1) is also E[Xi]. Now we use linearity of expectation:
E[X] = E[X 1 +... + Xn] = E[X 1 ] +... + E[Xn] = 1.
So, this is interesting: no matter how large a deck we are considering, the expected number of cards that end in the same position as they started is 1.
[hmm, lets leave this for homework]
We now give two methods for analyzing randomized quicksort. The first is more intuitive but the details are messier. The second is a neat tricky way using the power of linearity of expectation: this will be a bit less intuitive but the details come out nicer.
For simplicity, let us assume no two elements in the array are equal — when we are done with the analysis, it will be easy to look back and see that allowing equal keys could only improve performance. We now prove the following theorem.
Theorem 3.2 The expected number of comparisons made by randomized quicksort on an array of size n is at most 2 n ln n.
Proof: First of all, when we pick the pivot, we perform n − 1 comparisons (comparing all other elements to it) in order to split the array. Now, depending on the pivot, we might split the array into a LESS of size 0 and a GREATER of size n − 1, or into a LESS of size 1 and a GREATER of
size n − 2, and so on, up to a LESS of size n − 1 and a GREATER of size 0. All of these are equally likely with probability 1/n each. Therefore, we can write a recurrence for the expected number of comparisons T (n) as follows:
T (n) = (n − 1) +
n
n∑− 1
i=
(T (i) + T (n − i − 1)). (3.4)
Formally, we are using the expression for Expectation given in (3.3), where the n different possible splits are the events Ai.^3 We can rewrite equation (3.4) by regrouping and getting rid of T (0):
T (n) = (n − 1) +
n
n∑− 1
i=
T (i) (3.5)
Now, we can solve this by the “guess and prove inductively” method. In order to do this, we first need a good guess. Intuitively, most pivots should split their array “roughly” in the middle, which suggests a guess of the form cn ln n for some constant c. Once we’ve made our guess, we will need to evaluate the resulting summation. One of the easiest ways of doing this is to upper-bound the sum by an integral. In particular if f (x) is an increasing function, then
n∑− 1
i=
f (i) ≤
∫ (^) n
1
f (x)dx,
which we can see by drawing a graph of f and recalling that an integral represents the “area under the curve”. In our case, we will be using the fact that
∫ (cx ln x)dx = (c/2)x^2 ln x − cx^2 /4.
So, let’s now do the analysis. We are guessing that T (i) ≤ ci ln i for i ≤ n − 1. This guess works for the base case T (1) = 0 (if there is only one element, then there are no comparisons). Arguing by induction we have:
T (n) ≤ (n − 1) +
n
n∑− 1
i=
(ci ln i)
≤ (n − 1) +
n
∫ (^) n
1
(cx ln x)dx
≤ (n − 1) +
n
( (c/2)n^2 ln n − cn^2 /4 + c/ 4
)
≤ cn ln n, for c = 2.
In terms of the number of comparisons it makes, Randomized Quicksort is equivalent to randomly shuffling the input and then handing it off to Basic Quicksort. So, we have also proven that Basic Quicksort has O(n log n) average-case running time.
Here is a neat alternative way to analyze randomized quicksort that is very similar to how we analyzed the card-shuffling example.
(^3) In addition, we are using Linearity of Expectation to say that the expected time given one of these events can be written as the sum of two expectations.
you can buy or sell as much as you want, until at the end of the year all your money is converted back into cash. What is the best strategy for maximizing your expected gain?
The answer is that no matter what strategy you choose, your expected gain by the end of the year is 0 (i.e., you expect to end with the same amount of money as you started). Let’s prove that this is the case.
Define random variable Xt to be the gain of our algorithm on day t. Let X be the overall gain at the end of the year. Then,
X = X 1 +... + X 365.
Notice that the Xt’s can be highly dependent, based on our strategy. For instance, if our strategy is to pull all our money out of the stock market the moment that our wealth exceeds $m, then X 2 depends strongly on the outcome of X 1. Nonetheless, by linearity of expectation,
E[X] = E[X 1 ] +... + E[X 365 ].
Finally, no matter how many shares s of stock we hold at time t, E[Xt|s] = 0. So, using (3.3), whatever probability distribution over s is induced by our strategy, E[Xt] = 0. Since this holds for every t, we have E[X] = 0.
This analysis can be generalized to the case of gambling in a “fair casino”. In a fair casino, there are a number of games with different kinds of payoffs, but each one has the property that your expected gain for playing it is zero. E.g., there might be a game where with probability 99/100 you lose but with probability 1/100 you win 99 times your bet. In that case, no matter what strategy you use for which game to play and how much to bet, the expected amount of money you will have at the end of the day is the same as the amount you had going in.
Here’s another way to analyze quicksort — run the algorithm backwards. Actually, to do this analysis, it is better to think of a version of Quicksort that instead of being recursive, at each step it picks a random bucket in proportion to its size to work on next. The reason this version is nice is that if you imagine watching the pivots get chosen and where they would be on a sorted array, they are coming in completely at random. Looking at the algorithm run backwards, at a generic point in time, we have k pivots (producing k + 1 buckets) and we “undo” one of our pivot choices at random, merging the two adjoining buckets. [The tricky part here is showing that this is really a legitimate way of looking at Quicksort in reverse.] The cost for an undo operation is the sum of the sizes of the two buckets joined (since this was the number of comparisons needed to split them). Notice that for each undo operation, if you sum the costs over all of the k possible pivot choices, you count each bucket twice (or once if it is the leftmost or rightmost) and get a total of < 2 n. Since we are picking one of these k possibilities at random, the expected cost is 2n/k. So, we get
∑ k 2 n/k^ = 2nHn.