















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The binomial probability distribution, which calculates the probability of obtaining a certain number of successful events out of a fixed number of trials, each with a constant probability of success. examples of binomial distributions for different values of N and P, and discusses the relationship between the binomial distribution and the normal distribution for large values of N. It also includes R code for calculating binomial probabilities and provides a table of problems and answers.
Typology: Exams
1 / 23
This page cannot be seen from the preview
Don't miss anything!
The Binomial Distribution The Normal Approximation to the Binomial The Binomial Hypothesis Test Computing Binomial Probabilities in R 30 Problems
When you flip a coin there are only two possible outcomes - heads or tails. This is an example of a dichotomous event. Other examples are getting an answer right vs. wrong on a test, catching vs. missing a bus, or eating vs. not eating your vegetables. A roll of a dice, on other hand, is not a dichotomous event since there are six possible outcomes.
If you flip a coin repeatedly, say 10 times, and count up the number of heads, this number is drawn from what’s called a binomial distribution. Other examples are counting the number of correct answers on an exam, or counting the number of days that your ten year old eats his vegetables at dinner. Importantly, each event has to be independent, so that the outcome of one event does not depend on the outcomes of other events in the sequence.
We can define a binomial distribution with three parameters:
P is the probability of a ’successful’ event. That is the event type that you’re counting up - like ’heads’ or ’correct answers’ or ’did eat vegetables’. For a coin flip, P = 0.5. For guessing on a 4-option multiple choice test, P = 1/4 = .25. For my ten year old eating his vegetables, P = 0.05.
N is the number of repeated events.
k is the number of ’successful’ events out of N.
The probability of obtaining k successful events out of N, with probability P is:
N! k!(N −k)! P
k(1 − P )N −k
where N! = N (N − 1)(N − 2)..., or N ’factorial’.
For example, if you flip a fair coin (P=0.5) 5 times, the probability of getting 2 heads is:
P r(k = 2) = 5! 2!(5−2)!
Our textbook, the table handout, and our Excel spreadsheet gives you this number, where the columns are for different values of P and the rows are different values of k:
n k 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0. 5 0 0.7738 0.5905 0.4437 0.3277 0.2373 0.1681 0.116 0.0778 0.0503 0. 1 0.2036 0.328 0.3915 0.4096 0.3955 0.3601 0.3124 0.2592 0.2059 0. 2 0.0214 0.0729 0.1382 0.2048 0.2637 0.3087 0.3364 0.3456 0.3369 0. 3 0.0011 0.0081 0.0244 0.0512 0.0879 0.1323 0.1811 0.2304 0.2757 0. 4 0 0.0005 0.0022 0.0064 0.0146 0.0283 0.0488 0.0768 0.1128 0. 5 0 0 0.0001 0.0003 0.001 0.0024 0.0053 0.0102 0.0185 0.
We can plot this binomial frequency distribution as a bar graph:
0 1 2 3 4 5 k
0
probability
The shape of the probability distribution for N=5 should look familiar. It looks normal! More on this later.
We just calculated the probability of getting exactly 2 heads out of 5 coin flips. What about the probability of calculating 2 or more heads out of 5? It’s not hard to see that:
P r(k >= 2) = P r(k = 2) + P r(k = 3) + P r(k = 4) + P r(k = 5)
Using the table we can see that
P r(k >= 2) = 0.3125 + 0.3125 + 0.1562 + 0.0313 = 0. 8125
So there’s about a 8 percent chance of getting 5 or more questions right if you’re guessing. Better not take that class.
Example for when P > 0. 5 : Counting wrong answers
Notice that the values of P only go up to 0.5. What is P>0.5? For example, on that physics exam, what is the probability of getting 7 or more wrong out of the 10 questions. Now the probability of a ’successful’ event is 1 − 0 .25 = 0.75.
The trick to problems with P>0.5 is to turn the problem around so that a ’successful’ event has probability of 1-P. For our example, the probability of getting k=7 or more wrong is the same as the probability of getting N-k = 10-7 = 3 or fewer right.
So we can rephrase the problem: What is P r(k <= 3) with N=10 and P = 0.25?
We can find this using the binomial table spreadsheet:
n k 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0. 10 0 0.5987 0.3487 0.1969 0.1074 0.0563 0.0282 0.0135 0.006 0.0025 0. 1 0.3151 0.3874 0.3474 0.2684 0.1877 0.1211 0.0725 0.0403 0.0207 0. 2 0.0746 0.1937 0.2759 0.302 0.2816 0.2335 0.1757 0.1209 0.0763 0. 3 0.0105 0.0574 0.1298 0.2013 0.2503 0.2668 0.2522 0.215 0.1665 0. 4 0.001 0.0112 0.0401 0.0881 0.146 0.2001 0.2377 0.2508 0.2384 0. 5 0.0001 0.0015 0.0085 0.0264 0.0584 0.1029 0.1536 0.2007 0.234 0. 6 0 0.0001 0.0012 0.0055 0.0162 0.0368 0.0689 0.1115 0.1596 0. 7 0 0 0.0001 0.0008 0.0031 0.009 0.0212 0.0425 0.0746 0. 8 0 0 0 0.0001 0.0004 0.0014 0.0043 0.0106 0.0229 0. 9 0 0 0 0 0 0.0001 0.0005 0.0016 0.0042 0. 10 0 0 0 0 0 0 0 0.0001 0.0003 0.
So the probability of getting 7 or more wrong out of 10 is the same as the probability of getting fewer than 3 right, which is 0.0563 + 0.1877 + 0.2816 + 0.2503 = 0.
The table in the book goes up to N=15, and the Excel spreadsheet goes up to N=20. But what about higher values of N?
Here are some examples of binomial probability distributions for different values of n and P:
P = 0.25, N = 10
0 1 2 3 4 5 6 7 8 9 10 k
0
probability
P = 0.5, N = 20
0 5 10 15 20 k
0
probability
P = 0.75, N = 40
0 10 20 30 40 k
0
probability
P = 0.9, N = 80
0 20 40 60 80 k
0
probability
There’s that familiar bell curve! It turns out that the discrete binomial probability distri- bution can be approximated by the continuous normal distribution with a known mean and standard deviation. The binomial distribution becomes more ’normal’ with larger values of N and values of P closer to 0.5. A good rule is that the binomial distribution is very close to normal for N>=20.
Let’s look more closely at the probability distribution for P = 0.5 and N = 20:
P = 0.5, N = 20
0 2 4 6 8 10 12 14 16 18 20 k
0
probability
The mean of the normal distribution is intuitive. If you have 20 coin flips, each with probability 0.5, then the average number of heads should be (20)(0.5) = 10. So:
μ = N P = 10
The standard deviation is:
σ =
11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 17 17.5 18 18.5 19 19.5 20 k
0
probability
Using the binomial table, the actual probability of obtaining 13 or more heads is:
Pr(k=13) + Pr(k=14) + ... + Pr(k = 20) =
0.0739 + 0.037 + 0.0148 + 0.0046 + 0.0011 + 0.0002 + 0 + 0 = 0.
Looking at the figure above, notice that the widths of each bar is 1 unit. The area of each bar (height times width) is therefore equal to the probability of that event. That means that Pr(k>=13) is equal to the sum of the areas of the green bars.
So, to approximate the area of the green bars with the normal distribution (the red curve), we need to find the area under the red curve that covers the same range as the green bars.
Look closely, the green bar at k = 13 covers the range from 12.5 to 13.5. It follows that to approximate the area of the green bars, we need to find the area under the normal distribution above k = 12.5.
Since we know the mean and standard deviation of this normal distribution, we can find the z-score:
z = x−σ μ=^122.^5. 24 −^10 = 1. 12
Using the z-table:
Pr(z>1.12) = 0.
This is pretty close to the actual answer of 0.1316.
Left hander example
Of the 152 students in our class that took the survey, 7 reported themselves as left-handed. If 10% of the population is left-handed, what is the probability that 7 or fewer people in will be left handed in a random sample of 152 people?
The normal distribution that best approximates the distribution of left-handed people in a sample size of 152 will have a mean of:
μ = N P = (152)(0.1) = 15. 2
The standard deviation of:
σ =
To convert our value of 7 left-handers to a z-score, we need to include the bar that ranges from 6.5 to 7.5. So this time we need add 0.5 to 7 and calculate Pr(x<=7.5).
z = x−σ μ=^7. 35 .− 698615.^2 = − 2. 08
Pr(z < -2.08) = 0.0188.
So there is a about a 2 percent chance that on any given year we’d have 7 or fewer left- handers out of 152 students if the overall population is 10 percent left-handed.
Seahawks example
We can use our knowledge of the binomial distribution to make statistical inferences. For example, in 2019 the Seattle Seahawks football team won 11 games out of 16. Is this a better team than ’average’? Use an α value of 0.05. In other words, is the probability of winning 11 or more games out of 16 less than 0.05 under the null hypothesis that there is a 50/50 chance of winning each game?
To test this, we calculate P r(k >= 11) for N = 16 and P = 0.5. Since N<20, we’ll use the Cumulative Binomial table:
of 68 wins and 94 losses. What is the probability of losing 94 or more games?
Since there were N= 162 games we’ll have to use the normal approximation to the binomial distribution. With P = 0.5, the number of games that an average team loses (or wins) will be distributed approximately normally with a mean of:
μ = N P = (162)(0.5) = 81
and a standard deviation of:
σ =
Since we’re finding the probability of 94 or more losses, we have to subtract 0.5 from 94 when calculating the z-score. (Note that if we wanted to find the probability of 94 or fewer losses we’d add 0.5 to 94 when calculating the z-score). So, for 94 or more losses, the z-score for losing 94 or more games is:
z = (93 6 ..^5364 −81) = 1. 96
The probability of losing 94 or more games is therefore:
P r(k >= 94) = P r(z > 1 .96) = 0. 025
Since 0. 025 < 0 .05, we conclude that yes, the 2019 Mariners were a worse than average team.
Computing binomial probabilities is really easy in R - much easier than using the table. The function ’binom.test’ does everything for you. Note, the test always finds the ’exact’ answer, rather than the normal approximation, even if the number of trials is greater than
The R commands shown below can be found here: BinomialDistribution.R
dbinom(6,10,.5) [1] 0.
out <- binom.test(6,10,.5, alternative = "greater") out <- binom.test(5,10,.5, alternative = "less")
print(out$p.value) [1] 0.
out <- binom.test(4,20,1/5, alternative = "less") print(out$p.value) [1] 0.
out <- binom.test(7,10,2/3, alternative = "greater") print(out$p.value) [1] 0.
out <- binom.test(10-7,10,1-2/3,alternative = "less") print(out$p.value) [1] 0.
N <- 20 k <- seq(0,N) # ’sequence’ of numbers counting from 0 to N P <-. p <- dbinom(k,N,P)
barplot(p)
barplot(height = p, # probabilities names = k, # x axis values col = ’blue’, # color xlab = ’k’, # x label ylab = ’Pr(k)’, # y label main = sprintf(’N = %d’,N), # title space = 0) # spacing between bars
For P = 0.25 and N = 40, find P r(k <= 13)
For P = 0.45 and N = 36, find P r(k <= 14)
For P = 0.35 and N = 16, find P r(k <= 5)
For P = 0.85 and N = 24, find P r(k >= 15)
For P = 0.85 and N = 7, find P r(k >= 3)
For P = 0.3 and N = 26, find P r(k >= 12)
For P = 0.25 and N = 25, find P r(k <= 6)
For P = 0.1 and N = 43, find P r(k >= 1)
For P = 0.25 and N = 20, find P r(k >= 2)
For P = 0.45 and N = 26, find P r(k >= 16)
For P = 0.7 and N = 7, find P r(k >= 4)
For P = 0.15 and N = 38, find P r(k <= 7)
For P = 0.8 and N = 33, find P r(k >= 28)
For P = 0.7 and N = 34, find P r(k >= 27)
For P = 0.25 and N = 41, find P r(k <= 12)
For P = 0.5 and N = 19, find P r(k >= 3)
30 Answers
Using R: out<-binom.test(2,14,0.05,alternative = "greater") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (32)(0.85) = 27. 2 σ =
z = (29 2.^5. 0199 −^27. 2)= 1. 14 P r(k <= 29.5) = P r(z <= 1.14) = 0. 8729
Using R we’ll run the ’exact’ test: out<-binom.test(29,32,0.85,alternative = "less") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (26)(0.75) = 19. 5 σ =
z =
P r(k <= 19.5) = P r(z <= 0) = 0. 5
Using R we’ll run the ’exact’ test: out<-binom.test(19,26,0.75,alternative = "less") print(out$p.value) [1] 0.
Using R we’ll run the ’exact’ test: out<-binom.test(33,42,0.8,alternative = "greater") print(out$p.value) [1] 0.
Using R: out<-binom.test(8,9,0.6,alternative = "less") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (33)(0.85) = 28. 05 σ =
z =
P r(k >= 28.5) = P r(z >= 0.22) = 0. 4129
Using R we’ll run the ’exact’ test: out<-binom.test(29,33,0.85,alternative = "greater") print(out$p.value) [1] 0.
Using R: out<-binom.test(2,15,0.05,alternative = "greater") print(out$p.value) [1] 0.
Using R: out<-binom.test(4,18,0.6,alternative = "greater") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (45)(0.35) = 15. 75 σ =
z =
P r(k <= 11.5) = P r(z <= − 1 .33) = 0. 0918
Using R we’ll run the ’exact’ test: out<-binom.test(11,45,0.35,alternative = "less") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (44)(0.7) = 30. 8 σ =
z = (37 3.^5. 0397 −^30. 8)= 2. 2 P r(k >= 37.5) = P r(z >= 2.2) = 0. 0139
Using R we’ll run the ’exact’ test: out<-binom.test(38,44,0.7,alternative = "greater") print(out$p.value) [1] 0.
Using R we’ll run the ’exact’ test: out<-binom.test(14,36,0.45,alternative = "less") print(out$p.value) [1] 0.
Using R: out<-binom.test(5,16,0.35,alternative = "less") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (24)(0.85) = 20. 4 σ =
z =
P r(k >= 14.5) = P r(z >= − 3 .37) = 0. 9996
Using R we’ll run the ’exact’ test: out<-binom.test(15,24,0.85,alternative = "greater") print(out$p.value) [1] 0.
Using R: out<-binom.test(3,7,0.85,alternative = "greater") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (26)(0.3) = 7. 8 σ =
z =
P r(k >= 11.5) = P r(z >= 1.58) = 0. 0571
Using R we’ll run the ’exact’ test: out<-binom.test(12,26,0.3,alternative = "greater") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (25)(0.25) = 6. 25 σ =
z =
P r(k <= 6.5) = P r(z <= 0.12) = 0. 5478
Using R we’ll run the ’exact’ test: out<-binom.test(6,25,0.25,alternative = "less") print(out$p.value) [1] 0.
Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:
μ = N P = (43)(0.1) = 4. 3 σ =
z = (0 1.^5. 9672 −^4 .3) = − 1. 93 P r(k >= 0.5) = P r(z >= − 1 .93) = 0. 9732