
















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
R Companion to Montgomery’s Design and Analysis of Experiments.
Typology: Study Guides, Projects, Research
1 / 88
This page cannot be seen from the preview
Don't miss anything!
R Companion to Montgomery’s Design and Analysis of Experiments (2005)
Christophe Lalanne
D´ecembre 2006
The 6th edition of Montgomery’s book, Design and Analysis of Experiments, has many more to do with the various kind of experimental setups commonly used in biomedical research or industrial engineering, and how to reach significant conclusions from the observed results. This is an art and it is called the Design of Experiment (doe). The approach taken along the textbook differs from most of the related books in that it provides both a deep understanding of the underlying statistical theory and covers a broad range of experimental setups, e.g. balanced incomplete block design, split-plot design, or response surface. As all these doe are rarely presented altogether in an unified statistical framework, this textbook provides valuable information about their common anchoring in the basic ANOVA Model. Quoting Wiley’s website comments,
Douglas Montgomery arms readers with the most effective approach for learn- ing how to design, conduct, and analyze experiments that optimize perfor- mance in products and processes. He shows how to use statistically designed experiments to obtain information for characterization and optimization of systems, improve manufacturing processes, and design and develop new pro- cesses and products. You will also learn how to evaluate material alternatives in product design, improve the field performance, reliability, and manufactur- ing aspects of products, and conduct experiments effectively and efficiently.
Modern computer statistical software now offer an increasingly “power” and allow to run computationally intensive procedures (bootstrap, jacknife, permuation tests,... ) without leaving the computer desktop for one night or more. Furthermore, multivariate exploratory statistics have brought novel and exciting graphical displays to highlight the relations between several variables at once. As they are part of results reporting, they complement very kindly the statistical models tested against the observed data. We propose to analyze some the data provided in this textbook with the open-source R statistical software. The official website, www.r-project.org, contains additional in- formation and several handbook wrote by international contributors. To my opinion, R has benefited from the earlier development of the S language as a statistical program-
ming language, and as such offers a very flexible way to handle every kind of dataset. Its grahical capabilities, as well as its inferential engine, design it as the more flexible statistical framework at this time. The R packages used throughout these chapters are listed below, in alphabetical order. A brief description is provided, but refer to the on-line help (help(package="xx")) for further indications on how to use certain functions.
Package listing. Since 2007, some packages are now organized in what are called Task Views on cran website. Good news: There is a Task View called ExperimentalDesign. By the time I started to write this textbook, there were really few available ressources to create complex designs like fractional factorial or latin hypercube designs, nor was there any in-depth coverage of doe analysis with R, except [?] who dedicated some attention to blocking and factorial designs, J. Faraway’s handbook, Practical Regression and Anova using R [?] (but see cran contributed documentation^1 ), and G. Vikneswaran who wrote An R companion to “Experimental Design” which accompagnies Berger and Maurer’s book [?].
car provides a set of useful functions for ANOVA designs and Regression Models;
lattice provides some graphical enhancements compared to traditional R graphics, as well as multivariate displays capabilities; For Trellis Displays, see http://stat.bell-labs.com/project/trellis/
lme4 the newer and enhanced version of the nlme package, for which additional data structure are available (nested or hierarchical model,... );
nlme for handling mixed-effects models, developped by Pinheiro & Bates [?];
npmc implements two procedures for non-parametric multiple comparisons procedures;
Further Readings. Additional references are given in each chapter, when necessary. However, there are plenty of other general textbooks on doe, e.g. [?, ?, ?] (English) and [?, ?, ?, ?] (French), among the most recent ones.
(^1) Faraway has now published two books on the analysis of (Non-)Linear Models, GLM, and Mixed- effects Models, see [?, ?].
set.seed(891)
The function set.seed is used to set the rng to a specified state, and it takes any integer between 1 and 1023. Random values generation is part of statistical theory and these techniques are widely used in simulation sstudies. Moreover, random numbers are the core of several computational intensive algorithms, like the Bootstrap estimation procedure or Monte Carlo simulation design. A very elegant introduction to this topic is provided in [?] (see Chapter 8 for some hints on the use of R rng). R can be used to produce different kind of graphical representation. It’s probably the most challenging statistical tool for that particular option. Among them, dot diagram and histogram are useful tools to visualize continuous variable. Figure 2.1^ has been created with the following commands:
y1 <- c(16.85,16.40,17.21,16.35,16.52,17.04,16.96,17.15,16.59,16.57) y2 <- c(16.62,16.75,17.37,17.12,16.98,16.87,17.34,17.02,17.08,17.27) y <- data.frame(Modified=y1,Unmodified=y2) y.means <- as.numeric(apply(y,2,mean)) opar <- par(mfrow=c(2,1),mar=c(5,7,4,2),las=1) stripchart(y,xlab=expression("Strength (kgf/cm^2)"),pch=19) arrows(y.means,rep(1.5,2),y.means,c(1.1,1.9),length=.1) text(y.means,c(1.2,1.8),round(y.means,2),pos=4,cex=.8)
rd <- rnorm(200,mean=70,sd=5) hist(rd,xlab="quantile",ylab="Relative frequency", main="Random Normal Deviates\n N(70,5)") par(opar)
As can be seen with this snippet of code, relatively few commands allow to pro- duce powerful graphics... Indeed, several books or website are dedicated to exploratory multivariate graphics. Among others, the interested reader may have a look at:
and, of course, the must-have-one book that Venables & Ripley wrote on the S language, now in its fourth edition, [?]. Histograms are naturally more appropriate when there are numerous observations (e.g. n > 20). It is not uncommon to express the data as a density plotted against the observed value, and to superimpose a normal density function with the corresponding (^2) R code and Figures can be found on http://dsarkar.fhcrc.org/lattice/book/figures.html. (^3) R code and Figures can be found on http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html.
16.4 16.6 16.8 17.0 17.2 17.
Modified
Unmodified
Strength (kgf/cm^2)
Random Normal Deviates N(70,5)
quantile
Relative frequency 55 60 65 70 75 80 85
(^200) 4060
80
Figure 2.1: Dot diagram for the tension bond strength data (upper panel) and His- togram for 200 normal ran- dom deviates (lower panel).
55 60 65 70 75 80 85 90
Non parametric density estimate
N = 200 Bandwidth = 1.
Density
Figure 2.2: Density esti- mate for the same 200 normal random deviates.
mean and SD. However, a better way to highlight the distribution of the variable under study, especially in its continuous aspect, is to draw a non-parametric density curve, as shown in Figure 2.2. We often get a clearer picture of the underlying distribution, while the appropriate the number of bins used to display the histogram is not always an easy choice. But see [?] (pp. 126–130) for additional discussion on this topic. An other solution is to use a box-and-whisker plot, also called a boxplot. As illus- John Tukey (1915–2000) introduced modern techniques for the estimation of spectra of time series, notably the Fast Fourier Transform.
trated in Figure 2.3, a lot of information can be found in a boxplot. First, the rectangle box displays half of the total observations, the median being shown inside as an hor- izontal segment. The upper side of the box is thus the third quartile, while the first quartile is located at the lower side. The extreme tickmarks correspond to the min and max values. However, when an observation exceeds ± 1 .5 times the inter-quartile range from the median, it is explicitely drawn on the plot, and the extreme tickmarks then correspond to these reference values. This way of handling what could be considered as “extreme values” in R is known as the Tukey’s method. To get such a grahics, one use boxplot() function which accept either formula or variable + factor inputs. Figure 2. is thus simply produced using boxplot(y,ylab="Strength (kgf/cm^2)",las=1)
An example of a Laplace-Gauss—or the “normal”, for short—distribution, with mean 0 and SD 1, is shown in Figure 2.4. As it is a density function, its area equals 1 and any are comprised between two x-values can be calculated very easily using modern computer software. For instance, the shaded gray area, which is the probability P (1. 2 ≤ y < 2 .4, is estimated to be 0.107. With R, it is obtained using pnorm(2.4)-pnorm(1.2).
−3 −2 −1 0 1 2 3
μ
1 σ 2 π e
−(x 2 −μσ 2 )^2
Figure 2.4: The “normal” density function.
Here H 0 denotes the null hypothesis of the absence of effect while H 1 (also denoted HA by some authors) is the logical negation of H 0. This testing framework lead to consider two kind of potential errors: Type I error (α) when we reject the null while it is true in the real world, Type II error (β) when the null is not rejected while it should have been. Formally, this is equivalent to
α = Pr(Type I error) = Pr(reject H 0 | H 0 is true) β = Pr(Type II error) = Pr(fail to reject H 0 | H 0 is false)
Using this notation, α is generally refered to as the significance level, and it is what is reported by statistical software when running a given test. Both kind of error are equally important, although Type II error tends to be neglected in many studies. Figure 2. highlights the relation between these two quantities, based on two hypothetical distri- butions. The script is taken from cran website (but it is not very difficult to reproduce with a few commands).
2.4 The two-sample t-test
Comparing two set of observations on a response variable involves three steps: (1) con- structing a test statistics, (2) defining its sampling distribution, and (3) computing the associated p-value. As already said, the p-value represents the probability of observing a value at least as extremal as that observed using the present data. This is obviously a purely frequentist approach, but it proves to be sufficient in most cases.
Figure 2.5: Type I and II errors.
The test statistic is given by
t 0 = y¯ 1 − y¯ 2 Sp
1 n 1 +^
1 n 2
where ¯y 1 , 2 are the group means, n 1 , 2 the sample sizes and Sp an estimate of what is called the pooled variance. When n 1 = n 2 , the design is said to be balanced. The pooled variance is simply the average of the within-group variance, and is computed, in the general case, as S p^2 = (n 1 − 1)S 12 + (n 2 − 1)S 22 n 1 + n 2 − 2
The quantity n 1 + n 2 − 2 is called the degrees of freedom of the test statistics, that is the number of observations free to vary independently. There, we must distinguish two approaches in the inferential paradigm and the in- terpretation of the p-value. According to the Neyman & Pearson’s view, the statistical test provides an answer to a purely binary decision (accept or reject the null hypothesis) and the value of the p is not to be interpreted further than its position with respect to a criterion value, say 5%, defined before the start of the experiment^4. On the contrary, Fisher [?] has defended the idea that the value of p itself provides an indication of the Sir Ronald Aylmer Fisher (1890–1962) significantly contributed to the development of methods and sampling distributions suitable for small samp- les, and he’s considered the father of analysis of variance.
strength of the result against the null hypothesis. There are very long-standing debates on these two approaches and on the way sta- tistical results can be interpreted. We will use most of the time the former approach (binary decision rule) but also provide the value of the resulting p, though it is generally computed based on asymptotic theoretical results. (^4) The Neyman-Pearson criterion says that we should construct our decision rule to have maximum probability of detection while not allowing the probability of false alarm to exceed a certain value α. It can be shown that a likelihood ratio test that reject H 0 in favor of the alternative hypothesis is the most powerful test of size α, though in most case, this test is not used.
2.5 Comparing a single mean to a criterion value
2.6 Application to paired samples
Another situation arises when the two samples are related in some way. For example, we can imagine an experiment where a number of specimens are tested by both tip 1 and tip 2. Data are in hardness.txt.
tmp <- scan("hardness.txt",sep=",") hardness <- data.frame(y=tmp,tip=gl(2,10)) t.test(y~tip,data=hardness,paired=TRUE)
Here, we cannot conclude to a significant difference between the two tips (t(9) = − 0. 26 , p = 0.798). If we look at a plot of the two evaluations (Fig. 2.6, left), we can see that both are distributed along a line with slope 1 suggesting a close agreement between Tip 1 and Tip 2. In this particular context, a more useful way of checking agreement is to plot the difference between Tip 1 and Tip 2 as a function of the sum of the two evaluations (Fig. 2.6, right). This was initially proposed for assessing biomedical agreement by [?]. with(hardness, plot(y[tip==1],y[tip==2],xlab="Tip 1",ylab="Tip 2")) abline(0,1) with(hardness, plot(y[tip==1]+y[tip==2],y[tip==1]-y[tip==2], xlab="Tip 1 + Tip 2",ylab="Tip 1 - Tip 2",ylim=c(-3,3))) abline(h=0)
2 3 4 5 6 7 8 9
2
3
4
5
6
7
8
9
Tip 1
Tip 2
6 8 10 12 14 16 18
−
−
−
0
1
2
3
Tip 1 + Tip 2
Tip 1 − Tip 2
Figure 2.6: The Hardness testing experiment.
Let’s look at we would get if we ignore the pairing:
t.test(y~tip,data=hardness,var.equal=TRUE)
As expected, the degree of freedoms are twice the previous ones (n 1 + n 2 − 2 = 2(n − 1) when n 1 = n 2 = n) and the t-value is larger reflecting the extra variance not accounted for.
2.7 Non-parametric alternative
For two-sample comparisons, two non-parametric tests can be used, depending on the way data are collected. If both sample are independent, we use Mann-Whitney-Wilcoxon rank sum test, while for paired sample the corresponding test is called Wilcoxon signed rank test. Both are called using R function wilcox.test and the option paired={TRUE|FALSE}. For the previous examples, we get wilcox.test(y1,y2) wilcox.test(y~tip,data=hardness,paired=TRUE)
160 180 200 220
550
600
650
700
x
x
x
x
Etch Rate data
RF Power (W)
Observed Etch Rate (A
°^ /min)
x (^) Group Means
Figure 3.1: The Etch Rate data.
707.0 ˚A/min at 220 W. Moreover, it seems that this increase occurs in a linear fashion, but we will return to this point later on. In its most basic formulation, the one-way model of ANOVA is expressed as
yij = μi + εij i = 1,... , a; j = 1,... , n, (3.1)
where yij is the jth observation associated to treatment (or group) i, μi is the treatment mean, and εij is the so-called residual value assumed to be NIID. Equation 3.1 is called the means model. If we consider the μi with respect to the overall mean, denoted as μ with μi = μ + τi, then we can rewrite Equation 3.1 as
yij = μ + τi + εij i = 1,... , a; j = 1,... , n. (3.2)
Now, it can be seen that the τi represent the difference between treatment means and the overall mean, and they are called the effects, thus we talked about an effect model.
3.3 Estimating Model parameters
The ANOVA table (Tab. 3.1) is produced using the next commands. The aov() and lm() functions are of particular significance when running any ANOVA Model, but it is important to emphasize that the coding of variable is very important, especially when using the lm() command. In that particular case, categorical variables should be factor in the R terminology, otherwise a linear regression will be performed!
etch.rate$RF <- as.factor(etch.rate$RF)
etch.rate$run <- as.factor(etch.rate$run)
etch.rate.aov <- aov(rate~RF,etch.rate) summary(etch.rate.aov)
Df Sum Sq Mean Sq F value Pr(>F) RF 3 66870.55 22290.18 66.80 0. Residuals 16 5339.20 333.
Table 3.1: Results of the ANOVA model for the Etch Rate Experiment.
RF Mean Square largely overcomes residuals (also called error ) MS which is com- puted simply as the mean of the within-treatment variances. In other words, there is much more between-treatment variability than within-treatment ones, and such a result is very unlikely to occur by chance alone. Note that R doens’t produce a separate row for Total SS as it doesn’t provide any additional information most of the time. Here, Total SS simply equals 66870.55 + 5339.20 = 72209.75 (with 3 + 16 = 19 df). Turning back to the ANOVA result, we can estimate the overall mean and the treat- ment effects ˆτi = ¯y 1 · − y¯·· as follows:
(erate.mean <- mean(etch.rate$rate))
with(etch.rate, tapply(rate,RF,function(x) mean(x)-erate.mean)) This gives
RF 160 180 200 220 ˆτi -66.55 -30.35 7.65 89.
Another way to get the same result is to use the model.tables() function which provides valuable effect size information with complex design. model.tables(etch.rate.aov) Finally, to get 100(1−α)% confidence intervals for treatment effects, we may compute them by hand using the following formula
y¯i· ± tα/ 2 ,N −a
MSe n
whereas for any two treatments comparison the above formula becomes
(¯yi· − y¯j·) ± tα/ 2 ,N −a
2MSe n
Using R directly, we only have to compute the pooled SD and get the value of the corresponding t quantile.
or reference level, here the first level of RF (ie. 160 W). So, the difference ¯y 4 · − y¯ 1 · is estimated to lie between 131.3 and 180.3 95% of the time^1. We can check the correctness of the result using Equation 3.4, eg. for the last row labeled RF220: as.numeric(grp.means[4]-grp.means[1])+c(-1,1)qt(.975,16)sqrt(2*MSe/5)
3.4 Model checking
Model checking includes the verification of the following assumptions (in decreasing order of importance):
In short, residuals values, defined as eij = yij − yˆij , should be structureless and well balanced between treatments. Model checking can be done graphically and this often is the recommended way, although there exists a formal test for each of the above hypotheses. Several diagnostic plots are proposed in Figure 3.2. opar <- par(mfrow=c(2,2),cex=.8) plot(etch.rate.aov) par(opar)
When applied on the result of a call to aov(), the plot method produces different graphics which can be controlled with the which= option. R defaults to produce (see ?plot.lm): a plot of residuals against fitted values, a Scale-Location plot of √eij against fitted values, a Normal Q-Q plot, and a plot of residuals against leverages. The first two subplots are very useful to check for any departure from the homoscedasticity and nor- mality assumptions. The last plot (residuals vs. leverages) provides insight in possible influencing observations. This is a convenient wrapper function for diagnostic plots that have to be plotted separetely otherwise. For instance, plot(fitted(etch.rate.aov),residuals(etch.rate.ao produces the first plot (residuals against fitted values), while qqnorm(residuals(etch.rate.aov)); qqline(re corresponds to a Normal Q-Q plot. It could be useful to use these commands when we’re interested only in derived plan or subset of predictors. Standardized residuals may be used as a rough check for outliers. They are defined as dij = eij √ M Se
and it can be shown that they are distributed as N (0, 1) provided εij ∼ N (0, σ^2 ). About 95% of the standardized residuals should fall within ±2. (^1) Recall that the probability is attached to the confidence interval which is random, not to the true (population) parameter.
550 600 650 700
−
−
0
10
20 30
Fitted values
Residuals
Residuals vs Fitted 12
11
1
−2 −1 0 1 2
−1.
−0.
Theoretical Quantiles
Standardized residuals
Normal Q−Q 12
11
1
550 600 650 700
Fitted values
Standardized residuals
Scale−Location 1 1211
−
0
1
Factor Level Combinations
Standardized residuals RF : 160 180 200 220
Constant Leverage: Residuals vs Factor Levels 12
11
1
Figure 3.2: Model checking for the ANOVA model.
Independence of observations is largely a matter of the experimental design and the way data are collected. Perhaps the simplest graphical way to check for independence is to plot the residuals against run order or any time index (Fig. 3.3). This also allows to check for the homoscedasticity assumption since any departure from constant variance would be reflected in localized subsets of observations differing in their mean response, or any systematic pattern of outlyiers. Looking at the plot in Figure 3.3, no such pattern are visible thus we have no reason to reject the independence hypothesis. A more formal test, and an historical ones, is called the Durbin-Watson. This procedures aims at testing the serial autocorrelation of errors and by default makes use of constant lag of 1. It is readily available in the car and lmtest packages. require(car) durbin.watson(etch.rate.aov)
The assumption of constant variance, or homoscedasticity, is probably the most important in practice since we compute a pooled variance estimate by averaging the within-treatment variance. Any departure from this hypothesis means that some of the groups have larger or smaller variance than other, and this causes our estimate to be somewhat inaccurate. The question of what should be considered as significantly