Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Analysis: Factorial Design and Two-Way ANOVA, Thesis of Social Statistics and Data Analysis

An explanation of factorial design, a common design for n-way analysis of variance (anova), and a discussion on how to perform a two-way anova analysis using r. The concept of effects of factors (αi and βj), interaction between factors (αβij), and the decomposition of sums of squares. It also includes examples of unbalanced data and adjusted degrees of freedom.

What you will learn

  • What is the structural model of a two-way factorial ANOVA without interaction?
  • What is the complete model of the two-way factorial ANOVA including interaction effects?
  • How can the total sum of squares be decomposed in a two-way ANOVA?

Typology: Thesis

2017/2018

Uploaded on 03/16/2018

rajeshkumar-
rajeshkumar- 🇮🇳

2 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Analysis (draft) - Gabriel Baud-Bovy
1
Factorial design
The most common design for a n-
way ANOVA is the factorial design.
In a factorial design, there are two or
more experimental factors, each with
a given number of levels.
Observations are made for each
combination of the levels of each
factor (see example)
In a completely randomized factorial
design, each experimentally unit is
randomly assigned to one of the
possible combination of the existing
level of the experimental factors.
Example of a factorial design with two
factors (A and B). Each factor has three
levels. y
ijk
represents the k
th
observation
in the condition defined by the i
th
level of
factor A and j
th
level of factor B.
y
33k
y
32k
y
31k
A
3
Y
23k
y
22k
y
21k
A
2
y
13k
y
12k
y
11k
A
1
B
3
B
2
B
1
Factor B
Factor A
Data Analysis (draft) - Gabriel Baud-Bovy
2
Advantages of the factorial design
A two-way design enables us to examine the joint (or interaction)
effect of the independent variables on the dependent variable. An
interaction means that the effect of one independent variable has
on a dependent variable is not the same for all levels of the other
independent variable. We cannot get this information by running
separate one-way analyses.
Factorial design can lead to more powerful test by reducing the
error (within cell) variance. This point will appear clearly when will
compare the result of one-way analyses with the results of a two-
way analyses or t-tests.
Data Analysis (draft) - Gabriel Baud-Bovy
3
Interaction plot
An interaction plot represents the mean
value
m
ij
observed in each one of the
condition of a factorial design.
The Y axis corresponds to the dependent
(or criterion) variable. The various level of
one of the two experimental factor are
aligned on the X axis. The lines relate the
mean values that corresponds to the
same level of the second experimental
factor.
There is an interaction between the
factors if the lines are not parallel because
the effect of one factor depends on the
value of the other factor.
If the lines are a parallel, the effect of the
second factor is independent from the
value of the first factor. In other words,
there is no interaction.
µ
23
=24µ
22
=14µ
21
=10
A
2
µ
13
=20µ
12
=10µ
11
=6
A
1
B
3
B
2
B
1
µ
23
=4µ
22
=10µ
21
=10
A
2
µ
13
=20µ
12
=10µ
11
=6
A
1
B
3
B
2
B
1
Exercise. Make the interaction plots
for the second table. Describe the
interaction (if any).
B
3.002.001.00
Mean Y
30
20
10
0
A
1.00
2.00
A
2.001.00
Mean Y
30
20
10
0
B
1.00
2.00
3.00
Data Analysis (draft) - Gabriel Baud-Bovy
4
The structural model of a two-way
factorial ANOVA without interaction is
In absence of interaction, the mean value
µ
ij
in condition (A
i
B
j
) depends in a additive
manner on the effect of each condition
The complete model of the two-way
factorial ANOVA is
where
α
β
ij
=
µ
ij
-(α
i
+
β
j
+ µ)
ij
-µ
i•
-
µ
•j
+
µ
is the interaction effect. The
interaction effect represents the fact that
the contribution of one factors depends on
the value of the other factor in a non-
additive way.
Structural model (factorial ANOVA)
ijkjiijk
y
ε
β
α
µ
+++=
ijkijiiijk
y
ε
αβ
β
α
µ
++++=
µµ
•3
µ
•2
µ
•1
Mean
µ
2•
µ
23
µ
22
µ
21
A
2
µ
1•
µ
13
µ
12
µ
11
A
1
MeanB
3
B
2
B
1
Let y
ijk
be the k
th
observation of the i
th
level
of factor A and j
th
level of factor B.
Let
µ
ij
be the population mean for the i
th
level of factor A and j
th
level of factor B
(condition A
i
B
j
), let
µ
i•
be the population
mean in condition A
i
, let
µ
•j
be the
population mean in condition B
j.
and let
µ
be
the grand mean.
By definition
α
i
i•
–µ
is the effect of
factor A and β
j
= µ
•j
–µ
is the effect of
factor B.
jiij
β
α
++=
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Data Analysis: Factorial Design and Two-Way ANOVA and more Thesis Social Statistics and Data Analysis in PDF only on Docsity!

Factorial design

The most common design for a

n

way ANOVA is the factorial design.

In a factorial design, there are two ormore experimental factors, each witha given number of levels.

Observations are made for eachcombination of the levels of eachfactor (see example)

In a completely randomized factorialdesign, each experimentally unit israndomly assigned to one of thepossible combination of the existinglevel of the experimental factors.

Example of a factorial design with twofactors (A and B). Each factor has threelevels. y

ijk

represents the

k

th

observation

in the condition defined by the

i

th

level of

factor A and

j

th

level of factor B.

y

33k

y

32k

y

31k

A

Y

23k

y

22k

y

21k

A

y

13k

y

12k

y

11k

A

B

B

B

Factor B

Factor A

Data Analysis (draft) - Gabriel Baud-Bovy

Advantages of the factorial design•

A two-way design enables us to examine the

joint

(or interaction)

effect of the independent variables on the dependent variable. An interaction

means that the effect of one independent variable has

on a dependent variable is not the same for all levels of the otherindependent variable. We cannot get this information by runningseparate one-way analyses.

Factorial design can lead to more powerful test by reducing theerror (within cell) variance. This point will appear clearly when willcompare the result of one-way analyses with the results of a two-way analyses or t-tests.

Interaction plot

An interaction plot represents the meanvalue

m

ij

observed in each one of the

condition of a factorial design.

The Y axis corresponds to the dependent(or criterion) variable. The various level ofone of the two experimental factor arealigned on the X axis. The lines relate themean values that corresponds to thesame level of the second experimentalfactor.

There is an

interaction

between the

factors if the lines are not parallel becausethe effect of one factor depends on thevalue of the other factor.

If the lines are a parallel, the effect of thesecond factor is independent from thevalue of the first factor. In other words,there is no interaction.

A

A

B

B

B

A

A

B

B

B

Exercise.

Make the interaction plots

for the second table. Describe theinteraction (if any).

B

Mean Y

3020 100

A

1.002.

A

Mean Y

30 20 10 0

B

1.002.003.

Data Analysis (draft) - Gabriel Baud-Bovy

-^

The structural model of a two-wayfactorial ANOVA

without interaction

is

-^

In absence of interaction, the mean value μ

ij^

in condition (A

Bi

) depends in aj

additive

manner on the effect of each condition

-^

The complete model of the two-wayfactorial ANOVA iswhere

ij

ij

  • (

i^

j^

  • ij

i•

•j

is the interaction effect. The

interaction effect represents the fact thatthe contribution of one factors depends onthe value of the other factor in a non-additive way.

Structural model (factorial ANOVA)

ijk

j

i

ijk

y

ijk

ij

i

i

ijk

y

Mean

A

A

Mean

B

B

B

  • Let

y

ijk

be the

k

th

observation of the

i

th

level

of factor A and

j

th

level of factor B.

  • Let

ij^

be the population mean for the i

th

level of factor A and j

th

level of factor B

(condition A

Bi^

), letj

i•

be the population

mean in condition A

, leti

•j

be the

population mean in condition B

and letj.

be

the grand mean.

  • By definition

i^

i•

- μ

is the effect of

factor A and

j^

•j

- μ

is the effect of

factor B.

j

i

ij

Data Analysis (draft) - Gabriel Baud-Bovy

Exercise.

m

•j

A

A

m

i•

B

B

B

m

ij

m

•j

A

A

m

i•

B

B

B

m

ij

  • Compute the main and interaction effects from the mean values (see tables in the

left column). Answer: see tables in the right column.

j

A

A

i

B

B

B

ij

j

A

A

i

B

B

B

ij

Mean values

Table of effects

Data Analysis (draft) - Gabriel Baud-Bovy

Sum of squares

k j i

ij

ijk

k j i

ij

k j i

ijk

m

y

SSE

m

m

SStr

m

y

SST

k j i

j

i

ij

k j i

j

k j i

i

m

m

m

m

SSAB

m

m

SSB

m

m

SSA

-^

Like in the one-way ANOVA, the total sum ofsquares (SST) can be decomposed into a between-groups sum of square (the treatment effect, SStr)and a within-group sum of square (SSE) whichcorresponds to the residual variance:

SST = SStr + SSE

Note that the group (or experimental condition) in afactorial designed is determined by the value of twoor more experimental factors.

-^

The between-group variations (SStr) can themselvesbe decomposed further into a variations that areexplained by factor A (SSA), variations that areexplained by factor B (SSB) and variation that areexplained by the interaction between both factors(SSAB)

SStr = SSA + SSB + SSAB

Data Analysis (draft) - Gabriel Baud-Bovy

If the null hypotheses aretrue, the F ratios follow aFisher distribution with thecorresponding degrees offreedom.In that can be shown thatthe numerator is also anestimate the residualvariance.

F tests

-^

A factorial design aims at answering three different questions:

1. Is there an effect of the first experimental factor?

H

= 0i

y

ijk

j^

ij

ijk

2. Is there an effect of the second experimental factor?

H

= 0j

y

ijk

i^

ij

ijk

3. Is there an interaction?

H

ij^

( y

ijk

i^

j^

ijk

-^

In all cases, the alternative hypothesis is the complete model

H

y

ijk

i^

j^

ij

ijk

-^

The residual variance (within-group variance) for this modelis:

-^

In all cases, the F test is constructed by computing thepercentage of variance that is explained by the parameters ofinterest divided by the residual variance of the more complexmodel.

MSE

n

n

SSAB

F

MSE

n

SSB

F

MSE

n

SSA

F

B

A

AB

B

B

A

A

(^

n n n N n n N

SSE

n

n

n

SSE

MSE

B A B A B A

Data Analysis (draft) - Gabriel Baud-Bovy

[R] Interaction plot

visits<-read.table("visits.dat",header=TRUE)

visits$age<-ordered(visits$age,c("20-29","30-39","40-49",">50"))

interaction.plot(visits$age,visits$disease,visits$duration,

type="b",col=1:4,lty=1,lwd=2,pch=c(15,15,15,15),las=1,

xlab="Age",ylab="Visit duration (min)",trace.label="Disease")

45 40 35 30 25 20

Age

Visit duration (min)

20-

30-

40-

Disease

cancercerebrovascularheartuberculosis

  • Lines are not parallel which is the

tell-tale sign of an interaction. Thisplot suggests that the visit timeincrease with the older age groupsfor the cancer and cerebro-vascular diseases while itremained constant for the heartand tuberculosis diseases.

Type I, Type II and Type III sum of squares•

Type I (sequential):

Terms are entered sequentially inthe model.

Type I SS depend on the order inwhich terms are entered in themodel

Type I SS can be added to yield tothe total SS.

Type II (hierarchical)

see textbook

Type III (marginal)

Type III SS correspond to the SSexplained by a term

after

all other

terms have already been included inthe model.

Type III SS do not add.

  • The analysis of unbalanced data sets

(different number of observation ineach group) present speficialdifficulties because there are differentways of computing the sum ofsquares. These different wayscorresponding to different hypothesesand, correspondly, the F tests aredifferent.

  • The R function anova yields Type I

sum of square.

-^

Most textbooks suggest using theType III sum of squares and manystatistical sofftwares use Type III sumof square as a default but manystasticians think it does not makesense when there are statisticallysignificant interactions.

Overall & Spiegel (1969) Psychol. Bull., 72:311- 322, for a detailed discussion of factorial designs.

Data Analysis (draft) - Gabriel Baud-Bovy

[R] Type III sum of square

The function Anova in the library car compute type II and type IIIsum of squares >

library(car)

Anova(aov(duration~disease+age,v0),type="III")

Anova

Table (Type

III

tests)

Response:

duration

Sum Sq Df

F

value

Pr(>F)

(Intercept) 29261.

2.2e-

age

9.34e-

disease

2.2e-

Residuals

Compare with Type I sum of squares

anova(aov(duration~age+disease,v0))Analysis of Variance TableResponse: duration

Df

Sum Sq Mean Sq F value

Pr(>F)

age

3 1072.

16.532 3.017e-

disease

3 2928.

45.142 < 2.2e-

Residuals 71 1535.

anova(aov(duration~disease+age,v0))Analysis of Variance TableResponse: duration

Df

Sum Sq Mean Sq F value

Pr(>F)

disease

3 2839.

43.767 3.964e-

age

3 1161.

17.908 9.339e-

Residuals 71 1535.

Data Analysis (draft) - Gabriel Baud-Bovy

Repeated-measure designsThis example of one-wayrepeated measure ANOVAshows only small differencesbetween treatments and largedifference between subjects.In the repeated-measureANOVA, we neglect thevariations between subjectsand consider only thevariation for each treatmentwithin each subject.

-^

In repeated-measure designs, severalobservations are made on the same experimentalunits. For a example, one of the most commonresearch paradigm is that where subjects areobserved at several different point in time (e.g.,before and after treatment, longitudinal studies).

-^

In repeated measure design, it is important todistinguish between-subject and within-subjectfactors.

Within-subject factors

are variables (like

time or treatment or repetition) that identify thedifferences between conditions or treamentsthat have been assigned to each subject.-

Between-subject factors

are varables (like

age or sex or group) that identify differencesbetween the subjects.

Data Analysis (draft) - Gabriel Baud-Bovy

Statistical approaches•^

There are three approaches to repeated-measure designs: 1.

The univariate approach:

This approach uses the classic univariate F test of

the ANOVA. However, the data must satisfy the so-called sphericity conditionin addition of the usual assumptions for the test to be valid. It is possible toadjust degrees of freedom to account for possible violation of the sphericityassumption.

The multivariate approach:

This sphericity condition does not need to be

satisfied. However, this approach requires a larger number of observation(number of subjects must be larger than number of experimental conditions)and, in general, is less powerful than the univariate approach.

The linear mixed model approach:

This approach is probably the best

approach from a theoretical point of view but it is quite complex.

-^

References: Keselman, H. J., Algina, J., & Kowalchuk, R. K. (2001). The analysis of repeatedmeasures designs: a review. British Journal of Mathematical and Statistical Psychology, 54, 1-20.

The univariate approach•

In the previous examples of ANOVAs, we have assumed that theobservations between experimental conditions are uncorrelated (orindependent). This assumption is valid if different subjects are used indifferent experimental conditions. However, this assumption is no morevalid if the same subjects are used in several (or all) experimentalcondition because better subjects in one condition are also likely toperform better in the other conditions.

In the repeated-measure ANOVA, the data must also satisfy the so-called sphericity

(or

circularity

)^

condition or the

compound symmetry

condition in addition of the usual assumptions (independence,homogeneity of the variances, and normality).

The compound symmetry condition is a stronger assumption than thesphericity condition.

The sphericity condition needs to apply only to within-subject factors. It isautomatically satisfied if the within-subject factor has only two levels.

Data Analysis (draft) - Gabriel Baud-Bovy

Adjusting of the degrees of freedom•^

While tests for the sphericity or compound symmetry exist (e.g. Mauchly’s test),they are not very reliable because they are quite sensitive to deviations of thenormality assumption.

-^

A better approach is to

adjust the degrees of freedom

in order to make the tests of

the repeated measure ANOVA more conservative. Several correction factors exist:Greenhouse-Geisser (1959), Huynh-Feldt (1990) and a lower-bound value which ismost conservative (see relevant literature for more details). SPSS will automaticallycompute the value of these factors.

-^

To adjust the F test, it is necessary to multiply the two degrees of freedom of the Fdistribution by the correction factor. Since the value of the correction factor issmaller than 1, this will decrease the degrees of freedom of the F distribution andmake, in general, the test more conservative.

Example. RQ data set•

Mauchly’s test

is sued to to check if sphericity is statisfied

mauchly.test(fit,idata=idata,X=~1)

Mauchly's

test

of

sphericity

Contrasts

orthogonal

to

data:

SSD

matrix

from

aov(formula

cbind(rq.0,

rq.3,

rq.7)

data

=^

rq.w)

W

p-value

-^

Multivariate tests

do not assume sphericity

anova(fit,idata=idata,X=~1,test="Pillai") Analysis

of

Variance

Table

Contrasts

orthogonal

to

Df

Pillai

approx

F

num

Df

den

Df

Pr(>F)

(Intercept)

Residuals

Argument

test="Spherical"

gives access to alternative multivariate tests (

"Wilks",

"Hotelling-Lawley", "Roy",

"Spherical")

Data Analysis (draft) - Gabriel Baud-Bovy

The dataset contains the absolute thresholds of 6 subjects (3 males and 3 females).Each subject performed 10 trials with one of two possible different starting values(start=0 and 10).

We want to test whether there is a difference between the thresholds of the twosexes and iIf the threshold depend on the starting values

One between-subject factor

(sex) and

one within-subject factor

(start)

Example. Threshold dataset

Example. Threshold dataset

define

threshold

dataset

(wide

format)

th.w<-data.frame(

sex=factor(c("M","F","M","F","M","F")),y=matrix(

c(5.4,3.9,5.8,4.9,5.2,3.9,3.9,6.3,5.9,5.3,

4.2,5.5,4.8,5.5,5.0,6.1,5.4,6.0,4.0,7.7,3.6,3.1,4.5,4.9,5.1,5.1,6.2,4.9,5.4,4.5,4.6,6.1,5.6,2.8,5.9,4.5,5.5,3.0,5.7,8.1,4.0,4.9,4.9,4.5,5.5,3.7,6.2,5.2,3.6,6.3,4.3,6.1,4.3,5.3,4.3,4.9,5.9,4.9,6.0,7.6),nrow=6,byrow=TRUE)

define

within-subject

factors

idata<-data.frame(

half=factor(rep(1:2,each=5),start=factor(rep(c(0,10),5)))

reshape

into

long

format

th.l<-reshape(th.w,

varying=paste(“y",1:10,sep="."),v.names="y",idvar="su",timevar="trial",direction="long")

add

start

value

th.l$start<-ifelse(th.l$trial%%2,0,10)#

reorder

data

th.l<-th.l[order(th.l$su,th.l$trial),c("su","sex","trial","start","y")]#

define

factors

th.l$su<-factor(th.l$su)th.l$sex<-factor(th.l$sex)th.l$start<-factor(th.l$start)

Data Analysis (draft) - Gabriel Baud-Bovy

Example. Threshold data set^ >

summary(aov(y~sex*start+Error(su/start),th.l)) Error:

su

Df

Sum

Sq

Mean

Sq

F

value

Pr(>F)

sex

Residuals

Error:

su:start

Df

Sum

Sq

Mean

Sq

F

value

Pr(>F)

start

sex:start

Residuals

Error:

Within

Df

Sum

Sq

Mean

Sq

F

value

Pr(>F)

Residuals

compute

the

means

across

repetition

for

each

starting

values

th.l0<-aggregate(th.l[,"y",drop=FALSE],th.l[,c("su","sex","start")],mean)

th.l0<-th.l0[order(th.l0$su),]

repeated

measure

ANOVA

summary(aov(y~sex*start+Error(su/start),th.l0)) Error:

su

Df

Sum

Sq

Mean

Sq

F

value

Pr(>F)

sex

Residuals

Error:

su:start

Df

Sum

Sq

Mean

Sq

F

value

Pr(>F)

start

sex:start

Residuals

-^

Within group variability is not takeninto acount in the test of withingsubject factors

-^

=> ANOVA can be done with meanvalues

-^

Adjustement of dofs is necessaryonly if the number of dofs > 1.

read.table("elashof.dat",header=TRUE)

head(ela.l)group

su

drug

dose

dv

ela.l$group<-factor(ela.l$group)

ela.l$su<-factor(ela.l$su)

ela.l$drug<-factor(ela.l$drug)

ela.l$dose<-factor(ela.l$dose)

ela.w<-data.frame(

group=rep(1:2,each=8),matrix(ela.l$dv,nrow=16,byrow=T))

names(ela.w)<-c("group",outer(c("v1","v2"),1:3,paste,sep=""))

idata<-data.frame(

drug=factor(rep(1:2,each=3)),dose=factor(rep(1:3,2)))

ela.w$group<-factor(ela.w$group)

head(ela.w)

group su v11 v21 v31 v12 v22 v

Example. The Elashoff dataset

  • The questions of interest are: Will the drug be differentially effective for different groups?

Is the effectiveness of the drugs dependent on the dose level? Is the effectiveness of thedrug dependent on the does level and the group?

-^

The dataset is in long-format (ela.l)

-^

The datset is balanced.

-^

Define factors

-^

reshape in wide format (ela.w)

-^

define within-subject factors (idata)

  • The Elashoff dataset (Stevens, Table 13.10): Two groups of eight subjects were given

three different doses of two drugs. This experimental design has

two within-subject

factors

(dose and drug) and

one between-subject factor

(group).

Data Analysis (draft) - Gabriel Baud-Bovy

Example. The Elashoff dataset^ >

fit<-aov(dv~groupdrugdose+Error(su/(drug*dose)),ela.l)

summary(fit)

Error: su

Df Sum Sq Mean Sq F

value

Pr(>F)

group

Residuals

Error: su:drug

Df

Sum

Sq Mean

Sq F value

Pr(>F)

drug

group:drug

Residuals

Error: su:dose

Df

Sum

Sq Mean

Sq F value

Pr(>F)

dose

379.39 36.5097 1.580e-08 ***

group:dose

Residuals

Error: su:drug:dose

Df

Sum Sq

Mean Sq

F value

Pr(>F)

drug:dose

group:drug:dose

Residuals

-^

Let us initially assume the sphericitycondition.

-^

This analysis indicates a statisticallysignificant effect of the drug(F(1,14)=13.001, P=0.003).

-^

The significant interactionsDRUG*GP (F(1,14)=12.163,P=0.04)indicates that the effect of the drug isdifferent for each group.

-^

The only other significant effect isthe dose main effect(F(2,28)=36.510, P<0.001).

-^

The main effect of group is difficult to interpret because there is also statistically significantGROUP*DRUG interaction.)

-^

The is a significant difference in theresponses of the two groups(F(1,14)=7.092,P=0.019)

Example. The Elashoff dataset

  • The first plot shows that the average

value of the response increases withthe dose. The absence of interactionbetween the DOSE and the DRUGor the GROUP factor is independentfrom these factors.

  • The interaction plot shows that the

response to the second drug is muchlarger for the second than for thefirst group.

  • Note that the main effect of group is

misleading in this case. It is a side-effect of the observation that theresponse is much higher in thesecond group for the second drug.

28 26 24 22

Group

1

2

Drug

21

dose main effect group:drug interaction

27262524232221

Dose

Data Analysis (draft) - Gabriel Baud-Bovy

Example. The Elashoff dataset

  • Tests corresponding to sequential (Type 1) SS. >

anova(fit,idata=idata,M=~drug,X=~1,test="Spherical") Contrasts

orthogonal

to

Contrasts

spanned

by

~drug

Greenhouse-Geisser

epsilon:

Huynh-Feldt

epsilon:

Df

F

num

Df

den

Df

Pr(>F)

G-G

Pr

H-F

Pr

(Intercept)

group

Residuals

anova(fit,idata=idata,M=~drug+dose,X=~drug,test="Spherical") Contrasts

orthogonal

to

~drug

Contrasts

spanned

by

~drug

dose

Greenhouse-Geisser

epsilon:

Huynh-Feldt

epsilon:

Df

F

num

Df

den

Df

Pr(>F)

G-G

Pr

H-F

Pr

(Intercept)

1.580e-

9.785e-

1.705e-

group

Residuals

anova(fit,idata=idata,M=~drug:dose,X=~drug+dose,test="Spherical") Contrasts

orthogonal

to

~drug

dose

Contrasts

spanned

to

~drug:dose

Greenhouse-Geisser

epsilon:

Huynh-Feldt

epsilon:

Df

F

num

Df

den

Df

Pr(>F)

G-G

Pr

H-F

Pr

(Intercept)

group

Residuals

With this dataset, there is nodifference between Type Iand Type III SS.