




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Statistic with Design of Experiment
Typology: Study notes
1 / 143
This page cannot be seen from the preview
Don't miss anything!
Montgomery, D.C. (1997): Design and Analysis of Experiments (4th ed.), Wiley.
Example: Investigate tensile strength y of new synthetic fiber.
Known: y depends on the weight percent of cotton (which should range within 10% – 40%).
Decision: (a) test specimens at 5 levels of cotton weight: 15%, 20%, 25%, 30%, 35%. (b) test 5 specimens at each level of cotton content.
Single Factor Experiment with a = 5 levels and n = 5 Replicates.
=⇒ 25 runs.
Runs should be in Random Order (prohibit warm up effects of machine ...)
boxplot(y~w); plot(as.numeric(w), y); points(tapply(y, w, mean), pch=20)
15 20 25 30 35
0
5
10
15
20
25
30
Cotton Weight Percent
Tensile Strength
1 2 3 4 5
0
5
10
15
20
25
30
Cotton Weight Percent
Tensile Strength
We wish to test for differences between the mean strengths at all a = 5 levels of cotton weight percent ⇒ Analysis of Variance.
Use the Linear Regression Model
yij = μ + τi + ≤ij
for treatment i = 1,... , a, and replication j = 1,... , n.
Observation yij (ith treatment, jth replication) Parameter μ is common to all treatments (Overall Mean) Parameter τi is unique to the ith treatment (ith Treatment Effect) Random variable ≤ij is the Random Error component.
Further assumption: ≤ij iid ∼ N (0, σ^2 ).
Our interest is in the treatment effects.
Treatment effects τi are usually defined as the deviations from the overall mean
μ :=
a
∑^ a
i=
μi =
a
∑^ a
i=
(μ + τi) = μ +
a
∑^ a
i=
τi ,
Thus, we have a restriction on these effects, namely
∑^ a
i=
τi = 0.
Here, μi = E(yij) is the mean of all observations yij in the ith treatment (row).
ANOVA Decomposition
We are interested in testing the equality of the a treatment means
H 0 : μ 1 = μ 2 = · · · = μa ⇐⇒ H 0 : τ 1 = τ 2 = · · · = τa
which is equivalent to testing the equality of all treatment effects.
The Sum of Squares decomposition in Regression is valid
where SSR, the Sum of Squares due to the Regression model, is only related to the treatment effects τi. Hence, we have
∑^ a
i=
∑^ n
j=
(yij − μˆ)^2 =
∑^ a
i=
∑^ n
j=
(ˆμi − μˆ)^2 +
∑^ a
i=
∑^ n
j=
(yij − μˆi)^2
Therefore, the total variability in the data can be partitioned into a sum of squares of the differences between the treatment averages and the grand average, plus a sum of squares of the differences of observations within treatments from the treatment average.
ANOVA Table
Source of Sum of Degrees of Mean Variation Squares Freedom Square F Between Treatments SSR a − 1 M SR M SR/M SE Error (within Treatments) SSE N − a M SE Total SST N − 1
Tensile Strength Data: Test H 0 : μ 1 = μ 2 = μ 3 = μ 4 = μ 5 against H 1 : some means are different
Source of Sum of Degrees of Mean Variation Squares Freedom Square F 4 , 20 p-value Cotton Weight Percent 475.76 4 118.94 14.76 < 0. 001 Error (within Treatments) 161.20 20 8. Total 639.96 24
Thus, we reject H 0 and conclude that the treatment means differ!
summary(aov(y~w)) Df Sum Sq Mean Sq F value Pr(>F) w 4 475.76 118.94 14.757 9.128e-06 *** Residuals 20 161.20 8.
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Moreover, M SE estimates σ^2 and the (1 − α) confidence interval for the ith treatment mean μi is
[ yi· ± t 1 −α/ 2 ,N −a
M SE/n
W <- C(w, treatment); coefficients(aov(y~W)) # default contrast for w (Intercept) W20 W25 W30 W 9.8 5.6 7.8 11.8 1. W <- C(w, sum); coefficients(aov(y~W)) (Intercept) W1 W2 W3 W 15.04 -5.24 0.36 2.56 6. options(contrasts=c("contr.sum", "contr.poly")) # for all factors
Bartlett’s Test for Equality of Variances: H 0 : σ 12 = σ 22 = · · · = σ a^2
K^2 is based on the (pooled) sample variances and approximately χ^2 a− 1.
bartlett.test(y~W)
Bartlett test for homogeneity of variances
data: y by W Bartlett’s K-squared = 0.9331, df = 4, p-value = 0.
=⇒ Conclude that all 5 variances are the same!
This test is very sensitive to the normality assumption!
Relationship b/w σy and μ α λ = 1 − α Transformation σy ∝ const 0 1 no transformation σy ∝ μ^1 /^2 1/2 1/2 Square Root σy ∝ μ 1 0 Log σy ∝ μ^3 /^2 3/2 − 1 / 2 Reciprocal Square Root σy ∝ μ^2 2 − 1 Reciprocal
Selection of the Power: If σyi ∝ μαi = θμαi then
log σyi = log θ + α log μi
A plot of log σyi versus log μi is a straight line with slope α. Substitute σyi and μi by their estimates Si and yi· and guess the value of α from the plot.
Example: 4 different estimation methods of the peak discharge applied to the same watershed.
Method discharge (cubic feet / second) yi· Si 1 0.34 0.12 1.23 0.70 1.75 0.12 0.71 0. 2 0.91 2.94 2.14 2.36 2.86 4.55 2.63 1. 3 6.31 8.37 9.75 6.09 9.82 7.24 7.93 1. 4 17.15 11.82 10.95 17.20 14.35 16.82 14.72 2.
y <- c(0.34, 0.12, ..., 16.82); m <- gl(4, 6, labels=c(1, 2, 3, 4)) tapply(y, m, mean); tapply(y, m, sd) 1 2 3 4 0.710000 2.626667 7.930000 14. 1 2 3 4 0.661090 1.192202 1.647070 2. summary(aov(y~m)) Df Sum Sq Mean Sq F value Pr(>F) m 3 708.35 236.12 76.067 4.111e-11 *** Residuals 20 62.08 3.
bartlett.test(y~m)
Bartlett test for homogeneity of variances
data: y by m Bartlett’s K-squared = 8.9958, df = 3, p-value = 0.
The Bartlett Test rejects Equality of Variances. Thus we analyze y∗^ = √y.
ry <- sqrt(y); tapply(ry, m, sd) 1 2 3 4 0.4044534 0.3857295 0.2929908 0. summary(aov(ry~m)) Df Sum Sq Mean Sq F value Pr(>F) m 3 32.684 10.895 81.049 2.296e-11 *** Residuals 20 2.688 0.
To account for the use of the data to estimate α we reduce the error degrees of freedom by one. This gives F = 76. 99 again with p-value < 0. 001.
r <- residuals(aov(ry~m)); f <- fitted(aov(ry~m)); plot(f, r) library(mass); boxcox(y~m)
0 1 2 3 4 5
−1.
−0.
0.^ 0.^
fitted values
residuals
−2 −1 0 1 2
−
−
−
−
−
−
−
lambda
log−Likelihood
95%