Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Experimental Design and Analysis of Variance - Statistical Applications | MATH 241, Study notes of Mathematics

Material Type: Notes; Class: Statistical Applications; Subject: Mathematics; University: Saint Mary's College; Term: Spring 2009;

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-una
koofers-user-una 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Experimental Design and ANALYSIS of VARIANCE
ANALYSIS of VARIANCE tests equivalence of means by comparing sample variances. It assumes normal and inde-
pendent populations with equal variance.
H0:µ1=µ2=µ3···
Ha: Some mean is different.
We could test the means pair-wise to see if some pair of means shows a difference, but multiplying tests on the same
data set multiplies the alpha risk (risk of type I error) if you do three tests at the .05 level [as you would to compare
three means], theres up to a 14% chance (not just a 5% chance exact value depends on actual relationships) of getting
a significant result just by chance [that is, even if theres no difference at all in the popuations]. With six comparisons (as
you’d need to compare four means in pairs) the risk is up to 26.5%.
Example for class discussion: The following table represents the number of miles per gallon for a tankful of five different
brands of gasoline achieved by four different test drivers. The problem is to determine if the brands are different regarding
their yields in miles per gallon.
Mileage (miles per gallon) for cars driven with five brands of gasoline
BRAND A BRAND B BRAND C BRAND D BRAND E
27 29 30 24 25
26 28 32 24 21
21 27 27 23 22
22 24 27 21 20
¯xa= 24 ¯xb= 27 ¯xc= 29 ¯xd= 23 ¯xe= 22
sa= 2.94 sb= 2.16 sc= 2.45 sd= 1.41 se= 2.16
Grand Mean, denoted ¯
¯x= 25 (mean of all 20 observations) Sample size n= 20
H0: Mean mileage for A = mean mileage for B = . . . = mean mileage for E
Ha: There is some difference among brands in the (long-term) mileage.
The type of gasoline is called a factor (the different brands are the treatments). As in this example the factor is
frequently (but not always) a qualitative variable. The values of the factors determine the columns of the table.
The yield, in miles per gallon, is the response variable. The response variable is always a quantitative variable. The
numbers in the table are values of the response variable.
Let k= # of columns (# of treatments) and nj= # rows (# of replications) for the j-th treatment (j= 1,2,. . . , k ).
Then:
Treatment (factor) sum of squares =
k
X
j=1
njxj¯
¯x)2= SSTR [between- factor variation in x]
Error sum of squares =
k
X
j=1
nj
X
i=1
(xij ¯xj)2= SSE [within-factor variation in x]
Alternatively: SSE =
k
X
j=1
(nj1) s2
j
Total sum of squares (SST) =
k
X
j=1
nj
X
i=1
(xij ¯
¯x)2= SSTR + SSE
MSTR = SSTR
k1, mean square (of deviations) due to difference of treatments [“mean square for treatments”
also called “mean square for factors”]
MSE = SSE
nk, mean square (of deviations) due to sampling error [“mean square of (remaining) error”]
F = MSTR
MSE
1
pf3
pf4

Partial preview of the text

Download Experimental Design and Analysis of Variance - Statistical Applications | MATH 241 and more Study notes Mathematics in PDF only on Docsity!

Experimental Design and ANALYSIS of VARIANCE

ANALYSIS of VARIANCE tests equivalence of means by comparing sample variances. It assumes normal and inde- pendent populations with equal variance.

H 0 : μ 1 = μ 2 = μ 3 · · · Ha : Some mean is different.

We could test the means pair-wise to see if some pair of means shows a difference, but multiplying tests on the same data set multiplies the alpha risk (risk of type I error) — if you do three tests at the .05 level [as you would to compare three means], theres up to a 14% chance (not just a 5% chance – exact value depends on actual relationships) of getting a significant result just by chance [that is, even if theres no difference at all in the popuations]. With six comparisons (as you’d need to compare four means in pairs) the risk is up to 26.5%. Example for class discussion: The following table represents the number of miles per gallon for a tankful of five different brands of gasoline achieved by four different test drivers. The problem is to determine if the brands are different regarding their yields in miles per gallon.

Mileage (miles per gallon) for cars driven with five brands of gasoline BRAND A BRAND B BRAND C BRAND D BRAND E 27 29 30 24 25 26 28 32 24 21 21 27 27 23 22 22 24 27 21 20 x¯a = 24 x¯b = 27 ¯xc = 29 x¯d = 23 ¯xe = 22 sa = 2. 94 sb = 2. 16 sc = 2. 45 sd = 1. 41 se = 2. 16

Grand Mean, denoted ¯x¯ = 25 (mean of all 20 observations) Sample size n = 20

H 0 : Mean mileage for A = mean mileage for B =... = mean mileage for E Ha : There is some difference among brands in the (long-term) mileage.

The type of gasoline is called a factor (the different brands are the treatments). As in this example the factor is frequently (but not always) a qualitative variable. The values of the factors determine the columns of the table.

The yield, in miles per gallon, is the response variable. The response variable is always a quantitative variable. The numbers in the table are values of the response variable.

Let k = # of columns (# of treatments) and nj = # rows (# of replications) for the j-th treatment (j = 1, 2 ,... , k). Then:

Treatment (factor) sum of squares =

∑^ k

j=

nj (¯xj − x¯¯)^2 = SSTR [between- factor variation in x]

Error sum of squares =

∑^ k

j=

∑^ nj

i=

(xij − ¯xj )^2 = SSE [within-factor variation in x]

Alternatively: SSE =

∑^ k

j=

(nj − 1) s^2 j

Total sum of squares (SST) =

∑^ k

j=

∑^ nj

i=

(xij − ¯¯x)^2 = SSTR + SSE

MSTR =

SSTR

k − 1 , mean square (of deviations) due to difference of treatments [“mean square for treatments” — also called “mean square for factors”]

MSE =

SSE

n − k

, mean square (of deviations) due to sampling error [“mean square of (remaining) error”]

F =

MSTR

MSE

Numerator degrees of freedom = k − 1 and denominator degrees of freedom = n − k

Note that factor degrees of freedom + error degrees of freedom =(n − k) + (k − 1) = n − 1 = total degrees of freedom

ANALYSIS OF VARIANCE TABLE

DUE TO D.F. SS MS F FACTOR k − 1 ERROR n − k TOTAL n − 1

MINITAB: Select STAT>ANOVA >ONE WAY UNSTACKED if data are in separate columns [one column for each factor] - select the columns for the Responses panel or (less likely for textbooks, more likely for data encountered in practice) STAT>ANOVA>ONE WAY if all values of the response are in one column [the “Response” column] and the factor (for each value) is identified by a key code in a second column [the “factor” column]. We reject the null, and conclude that we have evidence of a difference among the (population) means if our sample F is larger than the critical F for the desired alpha with correct numerator and denominator degrees of freedom.

After the F–test In most analysis of variances problems it is not enough to simply know that the treatments do not all have the same means. We also want to know which treatment is best. Depending on the situation, best can mean having the highest mean or the lowest mean. Thus we must consider post analysis of variance or what to do when H 0 is rejected. There is still the problem of “adding alpha” if we try to perform many comparison tests.

  1. Estimating individual means [confidence interval for μj , based on all the data):

Recall s¯x = s √ n

MSE

n

, so the (1 − α) confidence interval for μj is ¯xj ± t α 2

MSE

n

with d.f. = n − k [MSE and degrees of freedom from whole sample, not just one factor]

  1. Estimating difference between two means (confidence interval for μi − μj ): The standard error for comparing two means, if we assume that both variables have the same standard deviation

[which we do, if we use ANOVA], is

MSE

ni

nj

so the (1 − α) confidence interval for the difference of two

means (μi − μj ) is (¯xi − ¯xj ) ± t α 2

MSE

ni

nj

and we reject H 0 : μi = μj in favor of Ha : μi 6 = μj (“not

equal” alternative) if zero is not in this interval.

  1. In most applications we are interested in testing Ha : μi > μj at the α significance level. Thus we reject H 0 in favor of Ha if

¯xi − x¯j sx¯i−x¯j

tα. This last inequality is equivalent (with some algebraic manipulation) to the condition

x¯i − ¯xj > tα

MSE

1 ni +^

1 nj

. The value on the right is known as the Least Significant Difference (LSD) between the (sample) means. This allows us to compare any pair of means if the ANOVA says there is a significant difference and conclude that μi > μj if ¯xi − ¯xj >LSD. If ni = nj = r this becomes

Reject H 0 [conclude μi > μj ] if ANOVA shows a significiant difference among means and ¯xi − x¯j > tα

2MSE

r

  1. The Bonferoni/True Significant Difference.

In comparing k means simultaneously [all with same sample size r] at the significance level α we let α′^ = 1 − (1 − α)k

and the True Significant Difference (TSD) is given by tα′

2MSE

r

NOTE: α′^ may be approximated by

α k where k is the number of factors being compared to a control. Thus when comparing four means at the .05 level we use α′^ =.^054 =. 0125

Two way analysis of variance allows removal of the variance due to another factor, called blocks, or testing the significance of two different treatments simultaneously. MINITAB: Select STAT, ANOVA, TWO-WAY.

(a) Should the Career Resource Center tell students that average starting salaries are the same for graduates of all the Colleges?

(b) Find the 90% confidence interval of the mean starting salary of a Education graduate.

(c) Can we conclude that average starting salary is better for a Liberal Arts graduate than for a Education graduate? (alpha=.05)

(d) If the goal is to obtain the highest starting salary, can you tell the students what College they should attend? (alpha=0.05)