Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Binomial Probability: Varying Success Probabilities and Trials, Exams of Probability and Statistics

The binomial probability distribution, which calculates the probability of obtaining a certain number of successful events out of a fixed number of trials, each with a constant probability of success. examples of binomial distributions for different values of N and P, and discusses the relationship between the binomial distribution and the normal distribution for large values of N. It also includes R code for calculating binomial probabilities and provides a table of problems and answers.

Typology: Exams

2021/2022

Uploaded on 09/12/2022

dreamingofyou
dreamingofyou 🇬🇧

4.5

(15)

233 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The Binomial Distribution
January 27, 2021
Contents
The Binomial Distribution
The Normal Approximation to the Binomial
The Binomial Hypothesis Test
Computing Binomial Probabilities in R
30 Problems
The Binomial Distribution
When you flip a coin there are only two possible outcomes - heads or tails. This is an
example of a dichotomous event. Other examples are getting an answer right vs. wrong
on a test, catching vs. missing a bus, or eating vs. not eating your vegetables. A roll of a
dice, on other hand, is not a dichotomous event since there are six possible outcomes.
If you flip a coin repeatedly, say 10 times, and count up the number of heads, this number
is drawn from what’s called a binomial distribution. Other examples are counting the
number of correct answers on an exam, or counting the number of days that your ten year
old eats his vegetables at dinner. Importantly, each event has to be independent, so that
the outcome of one event does not depend on the outcomes of other events in the sequence.
We can define a binomial distribution with three parameters:
Pis the probability of a ’successful’ event. That is the event type that you’re counting up -
like ’heads’ or ’correct answers’ or ’did eat vegetables’. For a coin flip, P = 0.5. For guessing
on a 4-option multiple choice test, P = 1/4 = .25. For my ten year old eating his vegetables,
P = 0.05.
Nis the number of repeated events.
kis the number of ’successful’ events out of N.
The probability of obtaining k successful events out of N, with probability P is:
N!
k!(Nk)!Pk(1 P)Nk
where N! = N(N1)(N2)..., or N ’factorial’.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Understanding Binomial Probability: Varying Success Probabilities and Trials and more Exams Probability and Statistics in PDF only on Docsity!

The Binomial Distribution

January 27, 2021

Contents

ˆ The Binomial Distribution ˆ The Normal Approximation to the Binomial ˆ The Binomial Hypothesis Test ˆ Computing Binomial Probabilities in R ˆ 30 Problems

The Binomial Distribution

When you flip a coin there are only two possible outcomes - heads or tails. This is an example of a dichotomous event. Other examples are getting an answer right vs. wrong on a test, catching vs. missing a bus, or eating vs. not eating your vegetables. A roll of a dice, on other hand, is not a dichotomous event since there are six possible outcomes.

If you flip a coin repeatedly, say 10 times, and count up the number of heads, this number is drawn from what’s called a binomial distribution. Other examples are counting the number of correct answers on an exam, or counting the number of days that your ten year old eats his vegetables at dinner. Importantly, each event has to be independent, so that the outcome of one event does not depend on the outcomes of other events in the sequence.

We can define a binomial distribution with three parameters:

P is the probability of a ’successful’ event. That is the event type that you’re counting up - like ’heads’ or ’correct answers’ or ’did eat vegetables’. For a coin flip, P = 0.5. For guessing on a 4-option multiple choice test, P = 1/4 = .25. For my ten year old eating his vegetables, P = 0.05.

N is the number of repeated events.

k is the number of ’successful’ events out of N.

The probability of obtaining k successful events out of N, with probability P is:

N! k!(N −k)! P

k(1 − P )N −k

where N! = N (N − 1)(N − 2)..., or N ’factorial’.

For example, if you flip a fair coin (P=0.5) 5 times, the probability of getting 2 heads is:

P r(k = 2) = 5! 2!(5−2)!

(0.5)^2 (1 − 0 .5)(5−2) = (10)(0. 52 )(0.5)^3 = 0. 3125

Our textbook, the table handout, and our Excel spreadsheet gives you this number, where the columns are for different values of P and the rows are different values of k:

n k 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0. 5 0 0.7738 0.5905 0.4437 0.3277 0.2373 0.1681 0.116 0.0778 0.0503 0. 1 0.2036 0.328 0.3915 0.4096 0.3955 0.3601 0.3124 0.2592 0.2059 0. 2 0.0214 0.0729 0.1382 0.2048 0.2637 0.3087 0.3364 0.3456 0.3369 0. 3 0.0011 0.0081 0.0244 0.0512 0.0879 0.1323 0.1811 0.2304 0.2757 0. 4 0 0.0005 0.0022 0.0064 0.0146 0.0283 0.0488 0.0768 0.1128 0. 5 0 0 0.0001 0.0003 0.001 0.0024 0.0053 0.0102 0.0185 0.

We can plot this binomial frequency distribution as a bar graph:

0 1 2 3 4 5 k

0

probability

The shape of the probability distribution for N=5 should look familiar. It looks normal! More on this later.

We just calculated the probability of getting exactly 2 heads out of 5 coin flips. What about the probability of calculating 2 or more heads out of 5? It’s not hard to see that:

P r(k >= 2) = P r(k = 2) + P r(k = 3) + P r(k = 4) + P r(k = 5)

Using the table we can see that

P r(k >= 2) = 0.3125 + 0.3125 + 0.1562 + 0.0313 = 0. 8125

So there’s about a 8 percent chance of getting 5 or more questions right if you’re guessing. Better not take that class.

Example for when P > 0. 5 : Counting wrong answers

Notice that the values of P only go up to 0.5. What is P>0.5? For example, on that physics exam, what is the probability of getting 7 or more wrong out of the 10 questions. Now the probability of a ’successful’ event is 1 − 0 .25 = 0.75.

The trick to problems with P>0.5 is to turn the problem around so that a ’successful’ event has probability of 1-P. For our example, the probability of getting k=7 or more wrong is the same as the probability of getting N-k = 10-7 = 3 or fewer right.

So we can rephrase the problem: What is P r(k <= 3) with N=10 and P = 0.25?

We can find this using the binomial table spreadsheet:

n k 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0. 10 0 0.5987 0.3487 0.1969 0.1074 0.0563 0.0282 0.0135 0.006 0.0025 0. 1 0.3151 0.3874 0.3474 0.2684 0.1877 0.1211 0.0725 0.0403 0.0207 0. 2 0.0746 0.1937 0.2759 0.302 0.2816 0.2335 0.1757 0.1209 0.0763 0. 3 0.0105 0.0574 0.1298 0.2013 0.2503 0.2668 0.2522 0.215 0.1665 0. 4 0.001 0.0112 0.0401 0.0881 0.146 0.2001 0.2377 0.2508 0.2384 0. 5 0.0001 0.0015 0.0085 0.0264 0.0584 0.1029 0.1536 0.2007 0.234 0. 6 0 0.0001 0.0012 0.0055 0.0162 0.0368 0.0689 0.1115 0.1596 0. 7 0 0 0.0001 0.0008 0.0031 0.009 0.0212 0.0425 0.0746 0. 8 0 0 0 0.0001 0.0004 0.0014 0.0043 0.0106 0.0229 0. 9 0 0 0 0 0 0.0001 0.0005 0.0016 0.0042 0. 10 0 0 0 0 0 0 0 0.0001 0.0003 0.

So the probability of getting 7 or more wrong out of 10 is the same as the probability of getting fewer than 3 right, which is 0.0563 + 0.1877 + 0.2816 + 0.2503 = 0.

The Normal Approximation to the Binomial

The table in the book goes up to N=15, and the Excel spreadsheet goes up to N=20. But what about higher values of N?

Here are some examples of binomial probability distributions for different values of n and P:

P = 0.25, N = 10

0 1 2 3 4 5 6 7 8 9 10 k

0

probability

P = 0.5, N = 20

0 5 10 15 20 k

0

probability

P = 0.75, N = 40

0 10 20 30 40 k

0

probability

P = 0.9, N = 80

0 20 40 60 80 k

0

probability

There’s that familiar bell curve! It turns out that the discrete binomial probability distri- bution can be approximated by the continuous normal distribution with a known mean and standard deviation. The binomial distribution becomes more ’normal’ with larger values of N and values of P closer to 0.5. A good rule is that the binomial distribution is very close to normal for N>=20.

Let’s look more closely at the probability distribution for P = 0.5 and N = 20:

P = 0.5, N = 20

0 2 4 6 8 10 12 14 16 18 20 k

0

probability

The mean of the normal distribution is intuitive. If you have 20 coin flips, each with probability 0.5, then the average number of heads should be (20)(0.5) = 10. So:

μ = N P = 10

The standard deviation is:

σ =

(N )(P )(1 − P )

11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 17 17.5 18 18.5 19 19.5 20 k

0

probability

Using the binomial table, the actual probability of obtaining 13 or more heads is:

Pr(k=13) + Pr(k=14) + ... + Pr(k = 20) =

0.0739 + 0.037 + 0.0148 + 0.0046 + 0.0011 + 0.0002 + 0 + 0 = 0.

Looking at the figure above, notice that the widths of each bar is 1 unit. The area of each bar (height times width) is therefore equal to the probability of that event. That means that Pr(k>=13) is equal to the sum of the areas of the green bars.

So, to approximate the area of the green bars with the normal distribution (the red curve), we need to find the area under the red curve that covers the same range as the green bars.

Look closely, the green bar at k = 13 covers the range from 12.5 to 13.5. It follows that to approximate the area of the green bars, we need to find the area under the normal distribution above k = 12.5.

Since we know the mean and standard deviation of this normal distribution, we can find the z-score:

z = x−σ μ=^122.^5. 24 −^10 = 1. 12

Using the z-table:

Pr(z>1.12) = 0.

This is pretty close to the actual answer of 0.1316.

Left hander example

Of the 152 students in our class that took the survey, 7 reported themselves as left-handed. If 10% of the population is left-handed, what is the probability that 7 or fewer people in will be left handed in a random sample of 152 people?

The normal distribution that best approximates the distribution of left-handed people in a sample size of 152 will have a mean of:

μ = N P = (152)(0.1) = 15. 2

The standard deviation of:

σ =

(N )(P )(1 − P ) =

To convert our value of 7 left-handers to a z-score, we need to include the bar that ranges from 6.5 to 7.5. So this time we need add 0.5 to 7 and calculate Pr(x<=7.5).

z = x−σ μ=^7. 35 .− 698615.^2 = − 2. 08

Pr(z < -2.08) = 0.0188.

So there is a about a 2 percent chance that on any given year we’d have 7 or fewer left- handers out of 152 students if the overall population is 10 percent left-handed.

The Binomial Hypothesis Test

Seahawks example

We can use our knowledge of the binomial distribution to make statistical inferences. For example, in 2019 the Seattle Seahawks football team won 11 games out of 16. Is this a better team than ’average’? Use an α value of 0.05. In other words, is the probability of winning 11 or more games out of 16 less than 0.05 under the null hypothesis that there is a 50/50 chance of winning each game?

To test this, we calculate P r(k >= 11) for N = 16 and P = 0.5. Since N<20, we’ll use the Cumulative Binomial table:

of 68 wins and 94 losses. What is the probability of losing 94 or more games?

Since there were N= 162 games we’ll have to use the normal approximation to the binomial distribution. With P = 0.5, the number of games that an average team loses (or wins) will be distributed approximately normally with a mean of:

μ = N P = (162)(0.5) = 81

and a standard deviation of:

σ =

Since we’re finding the probability of 94 or more losses, we have to subtract 0.5 from 94 when calculating the z-score. (Note that if we wanted to find the probability of 94 or fewer losses we’d add 0.5 to 94 when calculating the z-score). So, for 94 or more losses, the z-score for losing 94 or more games is:

z = (93 6 ..^5364 −81) = 1. 96

The probability of losing 94 or more games is therefore:

P r(k >= 94) = P r(z > 1 .96) = 0. 025

Since 0. 025 < 0 .05, we conclude that yes, the 2019 Mariners were a worse than average team.

Computing Binomial Probabilities in R

Computing binomial probabilities is really easy in R - much easier than using the table. The function ’binom.test’ does everything for you. Note, the test always finds the ’exact’ answer, rather than the normal approximation, even if the number of trials is greater than

The R commands shown below can be found here: BinomialDistribution.R

BinomialDistribution.R

Calculating binomial probabilities is easy in R. The function ’binom.test’ takes in

three variables: (1) K, the number of successful outcomes, (2) N, the total number of trials,

and (3) P, the probability of a successful outcome on any given trial.

A fourth argument can be ’alternative = "less"’, or ’alternative = "greater"’, depending on

whether you want Pr(x<k) or Pr(x>k)

If you just want a binomial probability, that is, the probability of getting

exactly k successful events out of N tries, you can use the function ’dbinom’

where the three values go in the same order as ’binom.test’.

Example: Given 10 flips of a fair 50/50 coin, what is the probability of

getting exactly 6 heads?

dbinom(6,10,.5) [1] 0.

Example: Given 10 flips of a fair 50/50 coin, what is the probability of obtaining 6 or more he

out <- binom.test(6,10,.5, alternative = "greater") out <- binom.test(5,10,.5, alternative = "less")

The result can be found in the field ’p.value’:

print(out$p.value) [1] 0.

Example: If you guess on a 20 question multiple choice test where each question has 5 possible

answers, what is the probability of getting 4 or less correct?

out <- binom.test(4,20,1/5, alternative = "less") print(out$p.value) [1] 0.

Example: If a basketball player has a 2 out 3 chance of making a free throw on any given try,

and all tries are independent, what is the probability of making 7 or more out of 10?

out <- binom.test(7,10,2/3, alternative = "greater") print(out$p.value) [1] 0.

This is the same as:

out <- binom.test(10-7,10,1-2/3,alternative = "less") print(out$p.value) [1] 0.

what do binomial distributions look like?

N <- 20 k <- seq(0,N) # ’sequence’ of numbers counting from 0 to N P <-. p <- dbinom(k,N,P)

barplot(p)

To make the plot prettier, and to add labels:

barplot(height = p, # probabilities names = k, # x axis values col = ’blue’, # color xlab = ’k’, # x label ylab = ’Pr(k)’, # y label main = sprintf(’N = %d’,N), # title space = 0) # spacing between bars

  1. For P = 0.25 and N = 40, find P r(k <= 13)

  2. For P = 0.45 and N = 36, find P r(k <= 14)

  3. For P = 0.35 and N = 16, find P r(k <= 5)

  4. For P = 0.85 and N = 24, find P r(k >= 15)

  5. For P = 0.85 and N = 7, find P r(k >= 3)

  6. For P = 0.3 and N = 26, find P r(k >= 12)

  7. For P = 0.25 and N = 25, find P r(k <= 6)

  8. For P = 0.1 and N = 43, find P r(k >= 1)

  9. For P = 0.25 and N = 20, find P r(k >= 2)

  10. For P = 0.45 and N = 26, find P r(k >= 16)

  11. For P = 0.7 and N = 7, find P r(k >= 4)

  12. For P = 0.15 and N = 38, find P r(k <= 7)

  13. For P = 0.8 and N = 33, find P r(k >= 28)

  14. For P = 0.7 and N = 34, find P r(k >= 27)

  15. For P = 0.25 and N = 41, find P r(k <= 12)

  16. For P = 0.5 and N = 19, find P r(k >= 3)

30 Answers

  1. For P = 0.05 and N = 14, find P r(k >= 2) Since N <= 20 use the binomial table. P r(k >= 2) = 0.1229 + 0.0259 + ... + 0 = 0.

Using R: out<-binom.test(2,14,0.05,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.85 and N = 32, find P r(k <= 29)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (32)(0.85) = 27. 2 σ =

z = (29 2.^5. 0199 −^27. 2)= 1. 14 P r(k <= 29.5) = P r(z <= 1.14) = 0. 8729

Using R we’ll run the ’exact’ test: out<-binom.test(29,32,0.85,alternative = "less") print(out$p.value) [1] 0.

  1. For P = 0.75 and N = 26, find P r(k <= 19)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (26)(0.75) = 19. 5 σ =

z =

P r(k <= 19.5) = P r(z <= 0) = 0. 5

Using R we’ll run the ’exact’ test: out<-binom.test(19,26,0.75,alternative = "less") print(out$p.value) [1] 0.

  1. For P = 0.5 and N = 7, find P r(k >= 4)

Using R we’ll run the ’exact’ test: out<-binom.test(33,42,0.8,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.6 and N = 9, find P r(k <= 8) Since N <= 20 use the binomial table. With P > 0 .5 we need to switch the problem to P = 1-0.6 = 0.4, N = 9, P r(k >= 9 − 8) = P r(k >= 1) P r(k >= 1) = 0.0605 + 0.1612 + ... + 0.0003 = 0.

Using R: out<-binom.test(8,9,0.6,alternative = "less") print(out$p.value) [1] 0.

  1. For P = 0.85 and N = 33, find P r(k >= 29)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (33)(0.85) = 28. 05 σ =

z =

2. 0512 = 0.^22

P r(k >= 28.5) = P r(z >= 0.22) = 0. 4129

Using R we’ll run the ’exact’ test: out<-binom.test(29,33,0.85,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.05 and N = 15, find P r(k >= 2) Since N <= 20 use the binomial table. P r(k >= 2) = 0.1348 + 0.0307 + ... + 0 = 0.

Using R: out<-binom.test(2,15,0.05,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.6 and N = 18, find P r(k >= 4) Since N <= 20 use the binomial table. With P > 0 .5 we need to switch the problem to P = 1-0.6 = 0.4, N = 18, P r(k <= 18 − 4) = P r(k <= 14) P r(k <= 14) = 0.0001 + 0.0012 + ... + 0.0011 = 0.

Using R: out<-binom.test(4,18,0.6,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.35 and N = 45, find P r(k <= 11)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (45)(0.35) = 15. 75 σ =

z =

3. 1996 =^ −^1.^33

P r(k <= 11.5) = P r(z <= − 1 .33) = 0. 0918

Using R we’ll run the ’exact’ test: out<-binom.test(11,45,0.35,alternative = "less") print(out$p.value) [1] 0.

  1. For P = 0.7 and N = 44, find P r(k >= 38)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (44)(0.7) = 30. 8 σ =

z = (37 3.^5. 0397 −^30. 8)= 2. 2 P r(k >= 37.5) = P r(z >= 2.2) = 0. 0139

Using R we’ll run the ’exact’ test: out<-binom.test(38,44,0.7,alternative = "greater") print(out$p.value) [1] 0.

Using R we’ll run the ’exact’ test: out<-binom.test(14,36,0.45,alternative = "less") print(out$p.value) [1] 0.

  1. For P = 0.35 and N = 16, find P r(k <= 5) Since N <= 20 use the binomial table. P r(k <= 5) = 0.001 + 0.0087 + 0.0353 + 0.0888 + 0.1553 + 0.2008 = 0.

Using R: out<-binom.test(5,16,0.35,alternative = "less") print(out$p.value) [1] 0.

  1. For P = 0.85 and N = 24, find P r(k >= 15)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (24)(0.85) = 20. 4 σ =

z =

1. 7493 =^ −^3.^37

P r(k >= 14.5) = P r(z >= − 3 .37) = 0. 9996

Using R we’ll run the ’exact’ test: out<-binom.test(15,24,0.85,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.85 and N = 7, find P r(k >= 3) Since N <= 20 use the binomial table. With P > 0 .5 we need to switch the problem to P = 1-0.85 = 0.15, N = 7, P r(k <= 7 − 3) = P r(k <= 4) P r(k <= 4) = 0.3206 + 0.396 + 0.2097 + 0.0617 + 0.0109 = 0.

Using R: out<-binom.test(3,7,0.85,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.3 and N = 26, find P r(k >= 12)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (26)(0.3) = 7. 8 σ =

z =

2. 3367 = 1.^58

P r(k >= 11.5) = P r(z >= 1.58) = 0. 0571

Using R we’ll run the ’exact’ test: out<-binom.test(12,26,0.3,alternative = "greater") print(out$p.value) [1] 0.

  1. For P = 0.25 and N = 25, find P r(k <= 6)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (25)(0.25) = 6. 25 σ =

z =

2. 1651 = 0.^12

P r(k <= 6.5) = P r(z <= 0.12) = 0. 5478

Using R we’ll run the ’exact’ test: out<-binom.test(6,25,0.25,alternative = "less") print(out$p.value) [1] 0.

  1. For P = 0.1 and N = 43, find P r(k >= 1)

Since N > 20 use the normal approximation and the z-table. k will be distributed normally with:

μ = N P = (43)(0.1) = 4. 3 σ =

z = (0 1.^5. 9672 −^4 .3) = − 1. 93 P r(k >= 0.5) = P r(z >= − 1 .93) = 0. 9732