



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Professor: Zhou; Class: Statistical Inference II; Subject: Statistics; University: California State University-East Bay; Term: Spring 1993;
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!
Minitab Notes for STAT 3503 Dept. of Statistics — CSU Hayward
An oil company tested four different blends of gasoline for fuel efficiency according to a Latin square design in order to control for the variability of four different drivers and four different models of cars. Fuel efficiency was measured in miles per gallon (mpg) after driving cars over a standard course.
Fuel Efficiencies (mpg) For 4 Blends of Gasoline (Latin Square Design: Blends Indicated by Letters A-D)
Car Model Driver I II III IV 1 D 15.5 B 33.9 C 13.2 A 29. 2 B^ 16.3^ C^ 26.6^ A^ 19.4^ D^ 22. 3 C 10.8 A 31.1 D 17.1 B 30. 4 A 14.7 D 34.0 B 19.7 C 21. These data are from Ott: Statistical Methods and Data Analysis, 4th ed., Duxbury, 1993, page 866. ( Similar data are given in the 5th edition by Ott/Longnecker, in problem 15.10, page 889.)
The design is called a "Latin" Square design because Latin letters are often used to show how levels of one factor are assigned to combinations of levels of the other two factors. It is often the case that one of the effects is of principal interest (here, Blend), while the other two are "blocking" effects to control variability (here, Driver and Model). For our data, notice that each Driver tests each of the four Blends; also each Blend is tested in each Model of car. However, not all combinations of Blend-Model-Driver are present. For example, Driver 3 did not test Blend A in Model IV. In a Latin square each of the three factors must have the same number t of levels, and there are nT = t^2 observations altogether. Problems 6.1.1. This Latin Square design has 4^2 = 16 observations. As noted just above not all 3-way combinations of the three effects are present. How many observations would have been required in order to include all 3-way combinations of Blend, Model, and Driver? 6.1.2. Suppose that there were 5 Blends, 5 Drivers, and 5 Models. A full "factorial" design with one observation on each 3-way combination would require 125 observations. Make a table showing how you could assign Blends A-E to make a Latin Square design for this situation. 6.1.3. For your convenience, the 16 observations in the fuel efficiency study are listed below (reading across the rows of the table above):
15.5 33.9 13.2 29.1 16.3 26.6 19.4 22.8 10.8 31.1 17.1 30.3 14.7 34.0 19.7 21.6.
Put these observations into c1 of a Minitab worksheet. Use the patterned data procedure to put subscripts for Driver into c2 and Model into c3 (use 1 for I, 2 for II, etc.). The subscripts for Blend (use 1 for A, 2 for B, etc.) will have to be entered directly into the worksheet one at a time because the Latin Square pattern cannot be entered automatically by Minitab. (Use the original data table in
this section above to do this. Use the printout below only to check your work.) Label the four columns appropriately: MPG, Driver, Model, Blend. When you are finished entering the data, print out the results. Here is what you should get (without the spaces between lines which have been introduced for easy reading):
MTB > print c1-c
ROW MPG Driver Model Blend
1 15.5 1 1 4 2 33.9 1 2 2 3 13.2 1 3 3 4 29.1 1 4 1 5 16.3 2 1 2 6 26.6 2 2 3 7 19.4 2 3 1 8 22.8 2 4 4 9 10.8 3 1 3 10 31.1 3 2 1 11 17.1 3 3 4 12 30.3 3 4 2 13 14.7 4 1 1 14 34.0 4 2 4 15 19.7 4 3 2 16 21.6 4 4 3
6.1.4. Because this experiment was done by an oil company it is reasonable to assume that the main issue is whether there are differences among the four Blends, and that Blend is a fixed effect because four particular blends are currently under study. Also, it may be reasonable to assume that Models of car and Drivers were chosen at random from among available models and drivers. Subject to these assumptions, write the model for this experiment. In specifying the range of the subscripts for Driver you can use " i = 1, 2, 3, 4." and for Model you can use " j = 1, 2, 3, 4." However, for Blend you need to say something like "the 4 values of the Blend subscript k are assigned to the 16 observations according to a Latin Square design," in order to make it clear that there are only 16 observations. 6.1.5. The design space of this Latin Square is really a very carefully chosen subset of a cube. There are 4^3 = 64 possible combinations of the levels of the three factors, of which only 16 are observed. The table at the beginning of this unit can be viewed as a two-dimensional representation of the cube with dimensions z^ = Driver (height), y^ = Model (width), and x^ = Blend (depth). Minitab makes three-dimensional scatterplots; in Minitab 14 they can be conveniently rotated.
GRAPH ➯➯➯➯ 3D ➯➯➯➯ Groups, z='Driver", y='Model', x='Blend', Group=Blend; Data view: symbols and lines.
TOOLS ➯➯➯➯ Toolbars ➯➯➯➯ 3D Graph, rotate about appropriate axes (Release 14)
The initial view is shown first, followed by the view that results when the plot is rotated so that its z -axis is perpendicular to the viewer (so that the connecting lines disappear behind the plotting symbols). This is like viewing the initial view from the top. In Minitab 14, use the tools to rotate such a plot so that, first, the y -axis is perpendicular to your view, and then the x axis. Also, with some fussing, you should be able to orient the cube to match the data table. (If visible on your copy of this unit, the colors of the symbols for Blend match the colors of the letters in the data table in section 6.1.)
Although Latin square designs have the particular kind of "balance" described in of the previous section, these designs are not balanced in the sense that each combination of the three factors occurs equally often. Some combinations occur once (e.g., Driver 1, Model I, and Blend D). But there are only 16 observations, and so most of the combinations do not occur at all (e.g., Driver 1, Model I, and Blend A). For this reason Minitab's "balanced ANOVA" procedure does not work for Latin square designs. The word cube may be more descriptive of these designs than square Imagine this particular design as a cube with 64 cells and try to visualize which 16 cells are filled (i.e. contain an observation) and which are not. What would the cube look like when viewed from each face? In a Latin Square design all three effects must have the same number t^ of levels, called the order^ of the Latin Square. Here t = 4. Because the Latin Square design is not balanced as required by Minitab's "Balanced ANOVA" procedure, we must use the "General Linear Model" procedure in order to obtain an ANOVA table for the fuel efficiency data.
STAT ➯➯➯➯ ANOVA (^) ➯➯➯➯ General linear model MTB > glm MPG = Driver Model Blend; SUBC> random Driver Model; SUBC> resid c5; SUBC> ems. General Linear Model Factor Type Levels Values Driver random 4 1 2 3 4 Model random 4 1 2 3 4 Blend fixed 4 1 2 3 4 Analysis of Variance for MPG, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Driver 3 5.897 5.897 1.966 0.50 0. Model 3 736.912 736.912 245.637 61.90 0. Blend 3 108.982 108.982 36.327 9.15 0. Error 6 23.809 23.809 3. Total 15 875. ... ... ...
(Note: Expected Mean Squares, Error Terms, and Variance Components have been omitted here. See Problem 6.2.3.)
We conclude that the blends are significantly different at the 5% level, but not at the 1% level. It is not surprising that the fuel efficiencies vary among various models of cars (which might range from small compacts to large RVs). It does not appear, however, that there are any significant differences among our Drivers. (In the general population some drivers are easier on fuel than others; perhaps the drivers for this study have been carefully trained so that their driving styles are similar.) Notes on orthogonality and GLM:
STAT ➯➯➯➯ Tables ➯➯➯➯ Cross tabulation, classification variables: 'Driver' 'Model', Summaries, associated variables 'MPG' 'Blend', checkbox, data MTB > table 'Driver' 'Model'; SUBC> data 'MPG' 'Blend'.
Now, by hand, make a table similar to the original data table, except that the rows are Drivers, the columns are Blends, and the labels within cells are Roman numerals I-IV designating Models. Verify your result in Minitab with a procedure similar to the one shown just above. Repeat (both by hand and in Minitab) for a table in which Drivers are designated within the cells. 6.2.2. Try to use Minitab's balanced ANOVA procedure (menu path STAT ➯➯➯➯ ANOVA ➯➯➯➯ Balanced ANOVA or command ANOVA) to analyze this block design. What happens? (A Latin Square is "balanced" in the sense that the four subspaces corresponding to the three factors and error are orthogonal. In Minitab's "Balanced ANOVA" all combinations of treatment levels must have the same number of observations; here 16 frequencies are 1 and 64 – 16 = 48 are 0.) 6.2.3. Repeat the GLM procedure shown in this section to obtain the EMS table and components of variance, and make a column of residuals. (Notice that we have not used the restrict subcommand because it is not available with GLM; for a Latin Square design this causes no difficulty.) Answer the following questions: (a) What mean square is in the denominator of each F-ratio. How do the EMSs lead you to the conclusion that this is correct? (b) What is strange about the component of variance for Driver? How do you interpret this result in practice? (c) What is the P-value of the Anderson-Darling test for normality of the residuals, and how do you interpret it? 6.2.4. Multiple comparison procedures for a Latin Square design. We have established that there are significant differences among the Blends. By hand, establish the pattern of significant differences using Fisher's LSD procedure and Tukey's HSD procedure (both at the 5% level). The formulas are similar to the ones for a one-way ANOVA, except that you must use MS(Error) from the ANOVA table as the variance estimate (i.e., instead of s^ W^2 ) and n^ = t. (The exact formula for LSD is given in Ott/Longnecker, page 1061.)
Note: For Latin square designs, the Minitab GLM procedure performs Tukey's HSD and other multiple comparisons (but, in Releases 14 and earlier, not Fisher's LSD). The output format of GLM is somewhat different from the one Minitab uses for comparisons in one-way ANOVAs; both confidence intervals and tests are available. We suggest that you do the Fisher and Tukey comparisons using a calculator and then verify your answers for Tukey comparisons using GLM. (As we see in Section 6.3, you will get incorrect results if you try to use a one-way ANOVA procedure on data from a Latin Square design, so that is not a feasible procedure for getting LSDs for our data).
INDIVIDUAL 95 PCT CI'S FOR MEAN BASED ON POOLED STDEV LEVEL N MEAN STDEV -------+---------+---------+--------- 1 4 23.575 7.818 (-----------------------) 2 4 25.050 8.388 (-----------------------) 3 4 18.050 7.344 (-----------------------) 4 4 22.350 8.375 (-----------------------) -------+---------+---------+---------+- POOLED STDEV = 7.993 14.0 21.0 28.
Notice that the SS(Blend) here is the same as in the Latin square ANOVA table, and that SS(Error) here gets split into SS(Model), SS(Driver), and SS(Error) in the Latin square table; similarly for degrees of freedom. Finally, notice that the bogus one-way analysis fails to detect the Blend effect. Problems 6.3.1. Perform two additional incorrect one-way ANOVAs, using first Driver and then Model as the single factor. If you were forced to use a spreadsheet program that will compute only one-way ANOVAs, how could you piece together results from this program to construct a correct ANOVA table for a Latin Square design? In other words, how can you combine information from three incorrect one-way ANOVAs to give the correct analysis of a Latin Square? 6.3.2. Perform incorrect analysis on these data, treating them as if they came from a block design with Blends as the fixed effect and Models as the blocking effect (ignoring Drivers). Compare sums of squares and degrees of freedom in the resulting ANOVA table with the correct ones from the Latin Square analysis. If you were to use the F-ratios from this incorrect procedure to draw conclusions about the significance of Blend and Model effects, would your conclusions happen to be correct or incorrect? What about the conclusions drawn from an incorrect block design that ignores Models instead of Drivers? Comment.
The View From Problem 6.1.5 That Matches the Original Data Table
Minitab Notes for Statistics 3503 by Bruce E. Trumbo, Department of Statistics, CSU Hayward, Hayward CA, 94542, Email: btrumbo@csuhayward.edu. Comments and corrections welcome. Copyright (c) 1991, 1995, 1997, 2000, 2001, 2004 by Bruce E. Trumbo. All rights reserved. These notes are intended mainly for the use of students at California State University, Hayward. Please contact the author in advance if other use is contemplated. Preparation of earlier versions of this document was partially supported by NSF grant USE 9150433. Modified 1/
1 2
(^34)
Dr iver 3
2 M odel
4 (^1 ) B lend
21
Blend
3 4
1 2
3D Scatterplot of Driver vs Model vs Blend