Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

10 Hypothesis Testing with Two Independent Samples, Slides of Biostatistics

is t-distributed with n − 1 degrees of freedom. Example: Two independent samples have been taken from two in- dependent normal populations. The observations ...

Typology: Slides

2021/2022

Uploaded on 09/27/2022

sharina
sharina 🇬🇧

4.5

(11)

217 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MATH1015 Biostatistics Week 10
10 Hypothesis Testing with Two
Independent Samples
Previously we have studied:
the one-sample t-test for population mean µ, using the
information provided by a single sample;
the one-sample z-test for population proportion p, based
on one sample;
the matched pairs t-test based on two observations on each
(or identical) subject (which reduces to the one-sample t-
test for differenced data).
This week we consider an extension of the above work and study
methods to compare two population means and two population
proportions, both based on two independent samples from two
populations.
10.1 Two-sample t-test comparing two popu-
lation means (P.145-149)
10.1.1 Introduction
In every area of human activity, new procedures are invented and
existing techniques are revised. Advances occur whenever a new
technique is proved to be better than the old. Hence we need
to test whether the new method is better than the old one but
based on a new experimental design other than matched pairs
design.
SydU MATH1015 (2013) First semester 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download 10 Hypothesis Testing with Two Independent Samples and more Slides Biostatistics in PDF only on Docsity!

10 Hypothesis Testing with Two

Independent Samples

Previously we have studied:

  • the one-sample t-test for population mean μ, using the

information provided by a single sample;

  • the one-sample z-test for population proportion p, based

on one sample;

  • the matched pairs t-test based on two observations on each

(or identical) subject (which reduces to the one-sample t-

test for differenced data).

This week we consider an extension of the above work and study

methods to compare two population means and two population

proportions, both based on two independent samples from two

populations.

10.1 Two-sample t-test comparing two popu-

lation means (P.145-149)

10.1.1 Introduction

In every area of human activity, new procedures are invented and

existing techniques are revised. Advances occur whenever a new

technique is proved to be better than the old. Hence we need

to test whether the new method is better than the old one but

based on a new experimental design other than matched pairs

design.

This section develops a popular statistical test that compares

the means of two independent populations.

A motivational example: Suppose that there are two types

of food available for milking cows. A farmer wishes to test which

type of food helps cows to produce more yield of milk.

An experiment: The farmer can select two independent groups

of cows who produce similar milk yields. One group is given the

food A and the other group is given the food B. After one week,

the farmer calculates the means and standard deviations of milk

yields for each group and then use his knowledge to decide the

type of food which gives better yield.

Further examples:

1. Compare the average age at first marriage of females in two

ethnic groups.

2. Compare the average efficiency of two brands of fertilisers.

3. Compare the average marks of statistics students at USYD

and UNSW

Note: This type of design do comparison using two indepen-

dent samples rather than matched pairs. This type of design is

necessary in some situations when matched pairs from similar

or same subjects are more difficult to form, for example, in the

comparison of two ethnic groups where human characteristics

are difficult to match.

A statistical test, the two-sample t-test, for such comparisons

can be developed under the following assumptions:

H 1 : μ 1 < μ 2 or H 1 : μ 1 − μ 2 < 0 (one-sided); H 1 : μ 1 ̸= μ 2 or H 1 : μ 1 − μ 2 ̸= 0 (two-sided).

Note: As we have two sample variances s^21 and s^21 , we need to com- bine them to form a single variance in order to develop a t test. This can be done by combining or pooling s^21 and s^21 as given below:

10.1.4 Combined or Pooled Variance

It can be shown that the best combination of s^21 and s^21 to produce the common variance is given by

s^2 p =

(n 1 − 1)s^21 + (n 2 − 1)s^22 n 1 + n 2 − 2

Remarks:

  • This combined variance s^2 p is called the pooled variance.
  • The pooled variance is simply a weighted average of the two individual sample variances, weighted by their df.

10.1.5 The Test Statistic

It can be proved that under the null hypothesis H 0 : μ 1 − μ 2 = 0, the test statistic

t =

X 1 − X 2

sp

1 n 1 +^

1 n 2

∼ tn 1 +n 2 − 2

is t-distributed with n 1 + n 2 − 2 degrees of freedom.

Note: The corresponding df for this two sample problem is 2 less than the total sample size of n 1 + n 2. Compare this with the one sample t-test

t =

X − μ 0 s/

n

∼ tn− 1

is t-distributed with n − 1 degrees of freedom.

Example: Two independent samples have been taken from two in- dependent normal populations. The observations are:

Sample 1: 8, 5, 7, 6, 9, 7 Sample 2: 2, 6, 4, 7, 6.

Find an estimate of the combined variance (or pooled variance).

Solution:

Sample 1: n 1 = 6. ¯x 1 = 7, s^21 = 2. Sample 2: n 2 = 5. ¯x 2 = 5, s^22 = 4.

Therefore, the combined or pooled variance (estimate) is:

s^2 p =

Example (cont): State the distribution of t = X¯^1 −^ X¯^2 sp

√ (^1) n 1 +^ n^12

under H 0.

Solution: Since the df = 6 + 5 − 2 = 9

t =

X 1 − X 2

sp

1 n 1 +^

1 n 2

∼ t 9

Example: Using the sample information, calculate the value of test statistic.

Example: A feeding test is conducted on a herd of 25 dairy cows to compare two diets, A and B. A sample of 13 cows randomly selected from the herd are fed diet A and the remaining cows are fed with diet B. From observations made over a three-week period, the average daily milk production (in L) is recorded for each cow:

Milk Yield (in L) Diet A (x 1 ) 44 44 56 46 47 38 58 53 49 35 46 30 41 Diet B (x 2 ) 35 47 55 29 40 39 32 41 42 57 51 39

Assume these two samples come from independent normally dis- tributed populations with equal variances σ^2.

(i) Find the mean and the sd for each sample.

(ii) Find an estimate of the ‘pooled variance’ s^2 p, which estimates the common variance σ^2

(iii) Perform the two-sample t-test to investigate the evidence of a difference in true mean milk yields for the two diets.

Solution:

(i) ¯x 1 = 45. 15 , s 1 = 7. 998 , n 1 = 13 for A x ¯ 2 = 42. 25 , s 2 = 8. 740 , n 2 = 12 for B

(ii) The ‘pooled’ sample variance is

s^2 p =

(n 1 − 1)s^21 + (n 2 − 1)s^22 (n 1 + n 2 − 2)

=

(iii) The two-sample t-test:

  1. Hypotheses: As we want to test whether there is a differ- ence in milk yields, we have a two-sided alternatives:

H 0 : μ 1 = μ 2 against H 1 : μ 1 ̸= μ 2.

  1. Test Statistic: Under H 0 ,

t 0 =

x¯ 1 − x¯ 2 sp

1 n 1 +^

1 n 2

1 13 +^

1 12

  1. P -value: 2P(T 23 ≥ 0 .867) > 0. 05
  2. Conclusion: Since P -value is > 0 .05, the data are consistent with H 0. There is no significant difference between the two di- ets.

−2.069 −0.867 0 0.867 2.

t 23 P−value=0. α= 0.05 (RR)

0.197 0.

Two−sided t−test

10.1.6 Confidence interval (CI) for μ 1 − μ 2

The (1 − α)100% CI for μ 1 − μ 2 is

x¯ 1 − x¯ 2 ± tn 1 +n 2 − 2 ,α/ 2 sp

n 1

n 2

10.2 Two-sample z-test for comparing two

population proportions (P.157-161)

In some life science problems we need to test whether the two popu- lation proportions for a particular attribute are equal.

Motivating example: Suppose that a federal member of the par- liament wishes to test whether two suburbs in his electorate have the same unemployment rate.

To test this, the member can take two independent samples (one from each suburb) and calculate the proportion of the unemployment. However, these two proportions can not show whether any difference between them is sufficiently large to support his claim. Therefore, we need to develop a proper statistical test.

10.2.1 Assumption

  1. Two independent samples
  2. Both sample sizes are large: both n 1 ≥ 30 and n 2 ≥ 30.

10.2.2 Hypotheses

Null hypothesis of interest: As we would like to compare two proportions p 1 and p 2 for each of the populations,

H 0 : p 1 = p 2 or equivalently H 0 : p 1 − p 2 = 0

Alternative hypothesis: Depending on the specific problem, it can be:

H 1 : p 1 > p 2 or equivalently H 1 : p 1 − p 2 > 0 (one-sided), H 1 : p 1 < p 2 or equivalently H 1 : p 1 − p 2 < 0 (one-sided), H 1 : p 1 ̸= p 2 or equivalently H 1 : p 1 − p 2 ̸= 0 (two-sided).

To develop a suitable test statistic, we need a single estimate for the proportion based on two independent samples under H 0 of equal proportions. This combined or pooled estimate is obtained using the formula given below:

10.2.3 Combined or pooled proportion

Suppose that x 1 and x 2 are the number of “successes” in each inde- pendent sample, and n 1 and n 2 their respective sample sizes. Under the null hypothesis that two population proportions are equal, we estimate this common proportion using:

pˆ =

x 1 + x 2 n 1 + n 2

Note: It is clear that

pˆ =

n 1 pˆ 1 + n 2 pˆ 2 n 1 + n 2

where ˆp 1 and ˆp 2 are the estimates of the two proportions based on two independent samples.

Remark: pˆ is just a weighted average of the two sample proportions pˆ 1 and ˆp 2 , weighted by their sample sizes.

10.2.4 The test Statistic

The formula for the test statistic is

z =

pˆ 1 − pˆ 2 √ p ˆ(1 − pˆ)

1 n 1 +^

1 n 2

) ∼^ N(0,^ 1)

since under the null hypothesis,

Var(ˆp 1 −pˆ 2 ) = Var(ˆp 1 )+Var(ˆp 2 ) = pˆ(1 − pˆ) n 1

pˆ(1 − pˆ) n 2 = ˆp(1−pˆ)

( 1 n 1

1 n 2

)

Preliminary calculations: Sample proportions are:

pˆ 1 =

X 1

n 1

= 0. 767 , pˆ 2 =

X 2

n 2

Combined or the pooled proportion is:

pˆ =

x 1 + x 2 n 1 + n 2

Hence the test statistic is:

z 0 =

pˆ 1 − pˆ 2 √ p ˆ(1 − pˆ)

1 n 1 +^

1 n 2

) =^

30 +^

1 72

) =^ −^0.^28

  1. P -value: P(Z < − 0 .28) = 1 − P(Z < 0 .28) = 1 − 0 .6103 = 0. 3897
  2. Conclusion: Since P -value is > 0 .05, the data are consistent with H 0. That is, there is insufficient evidence that Instructor A is inferior to Instructor B in terms of their student success rate.

−1.645 −0.

N( 0 , 1 ) P−value= 0. α=0.05 (RR)

One−sided Z−test

Example 2: On October 23, 2009, an outbreak of mumps was re- ported in Borough Park, Brooklyn. Fifty-seven children were diag- nosed with this childhood disease. Surprisingly, 43 of the children had the recommended two doses of MMR vaccine which is supposed to protect against the disease. In the past, from a sample of 100 children with mumps in New York State, 83% of them had the rec- ommended two doses of the vaccine. Test the hypothesis that the MMR vaccination rate for the two groups is different at α = 0.05.

Solution: Let

  • p 1 = proportion of vaccinated children with mumps in Boro Park
  • p 2 = proportion of vaccinated children with mumps in NYS
  1. Hypotheses: As we want to test whether the rates are different, we have a two-sided alternatives:

H 0 : p 1 = p 2 vs. H 1 : p 1 ̸= p 2.

  1. Test statistic: Preliminary calculations:

pˆ 1 =

X 1

n 1

= 0. 75 , pˆ 2 =

X 2

n 2

p ˆ =

x 1 + x 2 n 1 + n 2

Hence the test statistic is:

z 0 =

pˆ 1 − pˆ 2 √ p ˆ(1 − pˆ)

1 n 1 +^

1 n 2

) =^

57 +^

1 100

) =^ −^1.^21

Solution: The 95% CI for p 1 − p 2 is:

pˆ 1 − pˆ 2 ∓ z 1 −α/ 2

p ˆ(1 − pˆ)

n 1

n 2

Since the 95% CI contain 0, there is no significant difference between the two success rates. This result agrees with the result from hy- potheses testing.

Exercise: Find a 95% CI for p 1 − p 2 for example 2.

Answer: (-0.2051, 0.0539)