


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Class: Statistical Applications; Subject: Mathematics; University: Saint Mary's College; Term: Spring 2009;
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!
Experimental Design and ANALYSIS of VARIANCE
ANALYSIS of VARIANCE tests equivalence of means by comparing sample variances. It assumes normal and inde- pendent populations with equal variance.
H 0 : μ 1 = μ 2 = μ 3 · · · Ha : Some mean is different.
We could test the means pair-wise to see if some pair of means shows a difference, but multiplying tests on the same data set multiplies the alpha risk (risk of type I error) — if you do three tests at the .05 level [as you would to compare three means], theres up to a 14% chance (not just a 5% chance – exact value depends on actual relationships) of getting a significant result just by chance [that is, even if theres no difference at all in the popuations]. With six comparisons (as you’d need to compare four means in pairs) the risk is up to 26.5%. Example for class discussion: The following table represents the number of miles per gallon for a tankful of five different brands of gasoline achieved by four different test drivers. The problem is to determine if the brands are different regarding their yields in miles per gallon.
Mileage (miles per gallon) for cars driven with five brands of gasoline BRAND A BRAND B BRAND C BRAND D BRAND E 27 29 30 24 25 26 28 32 24 21 21 27 27 23 22 22 24 27 21 20 x¯a = 24 x¯b = 27 ¯xc = 29 x¯d = 23 ¯xe = 22 sa = 2. 94 sb = 2. 16 sc = 2. 45 sd = 1. 41 se = 2. 16
Grand Mean, denoted ¯x¯ = 25 (mean of all 20 observations) Sample size n = 20
H 0 : Mean mileage for A = mean mileage for B =... = mean mileage for E Ha : There is some difference among brands in the (long-term) mileage.
The type of gasoline is called a factor (the different brands are the treatments). As in this example the factor is frequently (but not always) a qualitative variable. The values of the factors determine the columns of the table.
The yield, in miles per gallon, is the response variable. The response variable is always a quantitative variable. The numbers in the table are values of the response variable.
Let k = # of columns (# of treatments) and nj = # rows (# of replications) for the j-th treatment (j = 1, 2 ,... , k). Then:
Treatment (factor) sum of squares =
∑^ k
j=
nj (¯xj − x¯¯)^2 = SSTR [between- factor variation in x]
Error sum of squares =
∑^ k
j=
∑^ nj
i=
(xij − ¯xj )^2 = SSE [within-factor variation in x]
Alternatively: SSE =
∑^ k
j=
(nj − 1) s^2 j
Total sum of squares (SST) =
∑^ k
j=
∑^ nj
i=
(xij − ¯¯x)^2 = SSTR + SSE
k − 1 , mean square (of deviations) due to difference of treatments [“mean square for treatments” — also called “mean square for factors”]
n − k
, mean square (of deviations) due to sampling error [“mean square of (remaining) error”]
Numerator degrees of freedom = k − 1 and denominator degrees of freedom = n − k
Note that factor degrees of freedom + error degrees of freedom =(n − k) + (k − 1) = n − 1 = total degrees of freedom
ANALYSIS OF VARIANCE TABLE
DUE TO D.F. SS MS F FACTOR k − 1 ERROR n − k TOTAL n − 1
MINITAB: Select STAT>ANOVA >ONE WAY UNSTACKED if data are in separate columns [one column for each factor] - select the columns for the Responses panel or (less likely for textbooks, more likely for data encountered in practice) STAT>ANOVA>ONE WAY if all values of the response are in one column [the “Response” column] and the factor (for each value) is identified by a key code in a second column [the “factor” column]. We reject the null, and conclude that we have evidence of a difference among the (population) means if our sample F is larger than the critical F for the desired alpha with correct numerator and denominator degrees of freedom.
After the F–test In most analysis of variances problems it is not enough to simply know that the treatments do not all have the same means. We also want to know which treatment is best. Depending on the situation, best can mean having the highest mean or the lowest mean. Thus we must consider post analysis of variance or what to do when H 0 is rejected. There is still the problem of “adding alpha” if we try to perform many comparison tests.
Recall s¯x = s √ n
n
, so the (1 − α) confidence interval for μj is ¯xj ± t α 2
n
with d.f. = n − k [MSE and degrees of freedom from whole sample, not just one factor]
[which we do, if we use ANOVA], is
ni
nj
so the (1 − α) confidence interval for the difference of two
means (μi − μj ) is (¯xi − ¯xj ) ± t α 2
ni
nj
and we reject H 0 : μi = μj in favor of Ha : μi 6 = μj (“not
equal” alternative) if zero is not in this interval.
¯xi − x¯j sx¯i−x¯j
tα. This last inequality is equivalent (with some algebraic manipulation) to the condition
x¯i − ¯xj > tα
1 ni +^
1 nj
. The value on the right is known as the Least Significant Difference (LSD) between the (sample) means. This allows us to compare any pair of means if the ANOVA says there is a significant difference and conclude that μi > μj if ¯xi − ¯xj >LSD. If ni = nj = r this becomes
Reject H 0 [conclude μi > μj ] if ANOVA shows a significiant difference among means and ¯xi − x¯j > tα
r
In comparing k means simultaneously [all with same sample size r] at the significance level α we let α′^ = 1 − (1 − α)k
and the True Significant Difference (TSD) is given by tα′
r
NOTE: α′^ may be approximated by
α k where k is the number of factors being compared to a control. Thus when comparing four means at the .05 level we use α′^ =.^054 =. 0125
Two way analysis of variance allows removal of the variance due to another factor, called blocks, or testing the significance of two different treatments simultaneously. MINITAB: Select STAT, ANOVA, TWO-WAY.
(a) Should the Career Resource Center tell students that average starting salaries are the same for graduates of all the Colleges?
(b) Find the 90% confidence interval of the mean starting salary of a Education graduate.
(c) Can we conclude that average starting salary is better for a Liberal Arts graduate than for a Education graduate? (alpha=.05)
(d) If the goal is to obtain the highest starting salary, can you tell the students what College they should attend? (alpha=0.05)