

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An in-depth look into the anova (analysis of variance) method, focusing on the definitional formulas and their significance. Notation, computation of sums of squares, degrees of freedom, mean squares, and the f ratio. It also discusses the magnitude of effect measures and assumptions in anova.
What you will learn
Typology: Lecture notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!
Psy 521/621 Univariate Quantitative Methods, Fall 2020 1
Notation and Computation of One-Way ANOVA Below, I present the definitional formulas for ANOVA. Many textbooks present the computational formulas which are simpler to use for larger problems. Definitional formulas have a very clear tie to the concepts behind the analysis, however. Nowhere is it more clear than in the ANOVA formulas, which quantify between and within-group variation. To simplify matters, I also use an equal- n version of the formulas, but ANOVA also can be used with unequal group sizes.
Notation It is easy to get lost or bogged down in the notation used in the ANOVA formulas. There are lots of subscripts which can be confusing. The notation used is a classic notation, but it is difficult for many students to penetrate. With ANOVA, we now have to keep track of multiple groups, so a subscript, j , is used to denote a specific group. A single score is now represented by Yij , indicating the score is for an
the full sample is now referred to as Y .. , because it is calculated across all individuals and all groups. So,
the “.” refers to “computing across” that element—either individuals or groups. Then, Y. (^) j represents the
mean of a particular group (e.g., Y .2 would be for the mean of the second group). The “.” is used in place
of the i because the mean is calculated using all the i 's for a particular group.
Sum of Squares Components There are three possible sums of squares—between-group some of squares ( SSA )^1 , within-group or error sum of squares ( SSs/A ), and total sum of squares ( SS (^) T ). Total sum of squares can be partitioned into between sum of squares and within sum of squares, representing the variation due to treatment (or the independent variable) and variation due to individual differences in the score respectively: SST = SS (^) A + SSs A /
Sum of squares between-groups examines the differences among the group means by calculating the
variation of each mean ( Y. (^) j ) around the grand mean ( Y .. ): (^) ( )
2 SS (^) A = n (^) ∑ Y. (^) j − Y ... n is the number of
observations in each group (i.e., each cell or level of factor A ).
Sum of squares within-groups examines error variation or variation of individual scores around each group mean. This is variation in the scores that is not due to the treatment (or independent variable):
( )
2 SS (^) s A / = (^) ∑∑ Yij − Y. j
The total sum of squares can be computed by adding the SS (^) A and the SSs/A , but they can also be computed the same way we would for computing the numerator in the formula for sample variance—by simply subtracting each score from the grand mean, squaring, and then summing across all cases.
Degrees of Freedom Each SS has a different degrees of freedom associated with it: df (^) A = a − 1 , df (^) s A / = a n ( − 1)= N − a , and
dfT = an − 1 = N − 1. Here, a equals the number of groups (or “levels” of the independent variable), n is the number of observations in each group (assuming they are equal), and N is the total number of observations in the study (which is equal to a multiplied by n for equal group size).
Mean Squares and F The mean squares are computed by dividing the SS by the df. This is akin to the computation of the sample variance that divides the sum of squares by degrees of freedom. In fact, (^) MS T = s^2. The F ratio is
then computed by creating a ratio of the between-groups variance to the within-groups variance:
(^1) SS (^) A is referred to as the method sums of squares in the example from Myers, Well, & Lorch (2010), because different levels of the memorization method were compared.
Psy 521/621 Univariate Quantitative Methods, Fall 2020 2
/
A s A
Magnitude of Effect Significant differences among the groups indicate that it is unlikely that the differences among the means is due to random sampling chance, but that leaves unanswered the question as to how large the
can be defined as the proportion of variance accounted for in the dependent variable by the independent variable. It is simply the proportion of between-group variation, as measured by the sum of squares between groups ( SSA ) relative to the total variation ( SS (^) T ).
(^2) A T
a (^) F a F an
f is Cohen's (1988) effect size measure and is equal to:
( ) (^) ( ) ( )
/ /
ˆ 1 A^ s A s A
a MS^ MS f an MS
f (the caret symbol above signifies a sample estimate) is used in power analysis and can be interpreted in terms of Cohen's suggested standards for small (.1), medium (.25), and large (.4) for that measure
Assumptions The assumptions for ANOVA follow those we discussed for the t -test: normal distribution of the dependent variable in the population, independence of observations, and equal variance (homogeneity) among groups in the population. As was seen with the t -test, the normality and equal variance assumptions are of greater concern with smaller sample sizes in ANOVA. And as we know from the central limit theorem and what we saw earlier with the simulation demonstration in class, the population distribution can be very nonnormal and the sampling distribution for the mean is still quite normal. The primary exception is when the population distribution is pretty extremely skewed or kurtotic and the same size is quite small. Heterogeneity of variances across groups is perhaps of more concern, but as with the t -test, the worst performance is when the group sample sizes are small, unequal in size , and the variances are very unequal (e.g., 4:1 ratio, Myers, Well, & Lorch, 2013, p. 137). In these circumstances, researchers should explore one of the corrective tests, of which, the James and the Welch tests appear to perform fairly well (Algina, Oshima, & Lin, 1994; Lix, Keselman, & Keselman, 1996). Remember that tests of unequal variances may fail to find significance in these more critical circumstances and may identify relatively trivial violations as significant in large samples, so they may be of limited utility, at least in deciding conclusively that there is a problem with heterogeneity.
References Algina, J., Oshima, T. C., & Lin, W. Y. (1994). Type I error rates for Welch’s test and James’s second-order test under nonnormality and inequality of variance when there are two groups. Journal of Educational Statistics, 19 , 275-291. Cohen, J. (1988). Statistical power analysis for the behavioral sciences, 2nd^ Edition. New York: Routledge. Lix, L. M., Keselman, J. C., & Keselman, H. J. (1996). Consequences of assumption violations revisited: A quantitative review of alternatives to the one-way analysis of variance F test. Review of educational research , 66 , 579-619. Myers, J.L., & Well, A.D., & Lorch, R.F.,Jr. (2010). Research design and statistical analysis (3rd Edition). Mahwah, NJ: Erlbaum.