Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Minitab Notes for STAT 3503: One-Factor ANOVA as a Generalization of the Two-Sample t Test, Study notes of Statistics

Minitab notes for unit 1 of stat 3503 at csu hayward, focusing on one-factor anova as a generalization of the two-sample t test. It covers data preparation, descriptive methods, and statistical tests. Students are guided on how to enter data, create subscripts, and perform descriptive statistics and one-way anova using minitab.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-t13
koofers-user-t13 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Minitab Notes for STAT 3503
Dept. of Statistics — CSU Hayward
Unit 1: One-Factor ANOVA as a
Generalization of the Two-Sample t Test
1.1. Data and Worksheet Preparation
Consider two randomly chosen samples of a particular drug. Bottles in Group 1 are chosen from
current production, those in Group 2 have been stored under regulated conditions for one year. There
are 10 bottles in each group. The potency of each bottle is assayed and recorded. The issue is whether
potency of the population of year-old bottles is the same as for the population of the ones currently
being made.
The potency data are as shown below:
Group 1:
10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6
Group 2:
9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9
These data are from Table 6.2 (page 269) Ott and Longnecker: An Introduction to
Statistical Methods and Data Analysis 5th ed., Duxbury, 2001.
One way to put these data into a Minitab worksheet is to "cut and paste" from the MS Word or
HTML version of this unit. Be sure Minitab commands are "enabled" before you start. The goal is to
make the Session Window look as shown below by using the bulleted instructions.
"Enable commands" in the Minitab Session Window using the EDITOR menu. (First, activate
the Session window by clicking anywhere within it; you cannot modify the Session window
when a Worksheet is active. Second, be sure to use the EDITOR menu, not EDIT.)
Type the first two lines below (the ones with the name and set commands). The DATA>
prompt should appear automatically at the beginning of the third line.
In the third line, instead of typing the data: In your browser, highlight the data for Group 1,
and "cut" these 10 observations using CTRL-C. In the Minitab Session Window, make sure the
cursor follows the DATA> prompt and "paste" the data with CTRL-V. Then press ENTER. (It's
OK if the spacing is a little different than you see below, but make sure that you captured all
10 observations.)
Similarly, cut and paste the data for Group 2 into the fourth line.
Finally, type end on the fifth line to signal that data entry for c1 is complete.
MTB > name c1 'Potency'
MTB > set c1
DATA> 10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6
DATA> 9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9
DATA> end
Now display the data in c1 using either the menu path or the command shown below:
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Minitab Notes for STAT 3503: One-Factor ANOVA as a Generalization of the Two-Sample t Test and more Study notes Statistics in PDF only on Docsity!

Minitab Notes for STAT 3503

Dept. of Statistics — CSU Hayward

Unit 1: One-Factor ANOVA as a

Generalization of the Two-Sample t Test

1.1. Data and Worksheet Preparation

Consider two randomly chosen samples of a particular drug. Bottles in Group 1 are chosen from

current production, those in Group 2 have been stored under regulated conditions for one year. There

are 10 bottles in each group. The potency of each bottle is assayed and recorded. The issue is whether

potency of the population of year-old bottles is the same as for the population of the ones currently

being made.

The potency data are as shown below:

Group 1:

Group 2:

These data are from Table 6.2 (page 269) Ott and Longnecker: An Introduction to Statistical Methods and Data Analysis (^) 5th ed., Duxbury, 2001.

One way to put these data into a Minitab worksheet is to "cut and paste" from the MS Word or

HTML version of this unit. Be sure Minitab commands are "enabled" before you start. The goal is to

make the Session Window look as shown below by using the bulleted instructions.

  • "Enable commands" in the Minitab Session Window using the EDITOR menu. (First, activate

the Session window by clicking anywhere within it; you cannot modify the Session window

when a Worksheet is active. Second, be sure to use the EDITOR menu, not EDIT .)

  • Type the first two lines below (the ones with the name and set commands). The DATA>

prompt should appear automatically at the beginning of the third line.

  • In the third line, instead of typing the data: In your browser, highlight the data for Group 1,

and "cut" these 10 observations using CTRL-C. In the Minitab Session Window, make sure the

cursor follows the DATA> prompt and "paste" the data with CTRL-V. Then press ENTER. (It's

OK if the spacing is a little different than you see below, but make sure that you captured all

10 observations.)

  • Similarly, cut and paste the data for Group 2 into the fourth line.
  • Finally, type end on the fifth line to signal that data entry for c1 is complete.

MTB > name c1 'Potency' MTB > set c DATA> 10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10. DATA> 9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9. DATA> end

Now display the data in c1 using either the menu path or the command shown below:

DATA (MANIP in Releases 13 and earlier) ➯➯➯➯ Display Data MTB > print c

This produces a (horizontal) printout of the 20 observations in c1. Also look in the worksheet to see

the data there.

Next we need a column of "subscripts" in c2 to show which observations come from which group.

Name c2 'Group' either with a command or by typing the name directly into the worksheet. Then

enter the subscripts using either the menus (bold type) or the set command.

Type Group atop column 2 in the Worksheet CALC ➯➯➯➯ Patterned Data, Simple, values from 1 to 2, each individual value repeated 10 times MTB > name c2 'Group' MTB > set c DATA> (1:2) DATA> end

This way of organizing data, with all observations in a single column and groups designated in a

separate column of subscripts, is called "stacked" format.

For such a small dataset you could just type the 20 'Potency' determinations and the 20 'Group'

numbers directly into the worksheet. However, when using documents in DOC or HTML format, you

may find it convenient to learn (i) to cut and paste data into a worksheet and (ii) to use the "patterned

data" features of the set command. It is best to start learning with the current relatively simple data

to do these two things.

Once you have entered data into a worksheet, you should always proofread your work before

continuing. You can do this either by printing the data to the Session window (using the print

command) or by looking directly at the Worksheet. Proofreading should become an automatic part of

data entry for you. Beyond the first few units these notes will not remind you to do this.

Problems

1.1.1. Here is an alternate way to prepare the worksheet. Follow through the steps, cutting and pasting

data where appropriate. What menu choices would produce the same results? [Look at the DATA

(MANIP) menu.] Explain what each command does. Compare c13 and c14 with c1 and c2.

MTB > name c11 'Fresh' c12 'Stored' MTB > set c DATA> 10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10. DATA> end MTB > set c DATA> 9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9. DATA> end MTB > stack c11 c12 c13; SUBC> subs c14.

1.1.2. In the process of working Problem 1.1.1 you put the data for each group into a separate column

(c11 and c12). Data in separate columns are said to be in "unstacked" format. Look at the DATA

(MANIP) menu and figure out how the stacked data in c1 can be put into unstacked format using the

subscripts in c2. (Use the column names c21 'New' and c22 'Old' for this.) What command/

subcommand combination could you use to unstack the data, without the help of the menus? (Minitab

is a command-based package. The menus are sometimes a convenient way to generate commands,

which appear in the Session window when the command language is active.)

They can be included as graphic images on the web and can be imported into word processing

and desk-top publishing documents. They greatly increase the file size of documents that

incorporate them. Minitab starts in professional graphics mode. To re-activate professional

graphics after using character graphics, use the command gpro.

Illustrate both types of graphics by making boxplots as follows:

MTB > gstd MTB > boxp c1; SUBC> by c2.

MTB > gpro MTB > boxp c1 * c MTB > dotp c1 * c

Comment on the results as follows:

(a) Do the boxplots show the differences between the two groups as clearly as do the

dotplots? More clearly? Defend your answer.

(b) Look at one of the dotplots above. Can you see exactly how many data points are

represented? Now look at one of the boxplots above. Can you see how many data

points are represented?

(c) Minitab's boxplots sometimes indicate the presence of outliers. Are outliers

indicated for either of our groups?

(d) What descriptive statistics are used in making box plots?

(e) Comment on the differences between standard-graphics and professional-graphics

boxplots.

(f) We have given several commands above. What menu choices can be used to

produce each style of boxplot?

1.3. t Test and One-Factor ANOVA

The descriptive methods in Section 1.2 strongly suggest that fresh samples of the drug tend to be

more potent than stored ones. Now we look at several different ways to confirm this impression with

formal statistical tests. That is, we test H 0 : the 2 groups have equal potency against Ha : the 2 groups

have different potencies.

The first of these is the two-tailed, pooled two-sample t test. The command for a two-sample t test on

stacked data is twot.

Minitab defaults for two-sample t tests:

  • The two-tailed alternative is the default; one-sided alternatives require the subcommand

alternative followed by either 1 (right-sided alternative) or -1 (left-sided).

  • The separate variances (" t -prime") test is the default. Pooling requires the subcommand pool.

Computer simulation results have established that the separate variances test is often

preferable for two-sample tests. Here we use the pooled test because it generalizes more

readily to the ANOVA methods of these notes.

Note on stacked vs. unstacked data: The command twosample would be used if

the potency measurements^ for the two groups had been entered into two separate

columns--one for Fresh and one for Stored. Such "unstacked" data are seldom used

for computer analysis outside of elementary statistics classes. Minitab is one of the

few serious computer packages that makes direct use of unstacked data--and, even

then, only for a few elementary procedures.

STAT ➯➯➯➯ Basic (^) ➯➯➯➯ 2-sample t, one column, assume equal variances MTB > twot c1 c2; SUBC> pool.

Two-sample T for Potency

Group N Mean StDev SE Mean 1 10 10.370 0.323 0. 2 10 9.830 0.241 0.

Difference = mu (1) - mu (2) Estimate for difference: 0. 95% CI for difference: (0.272230, 0.807770) T-Test of difference = 0 (vs not =): T-Value = 4.24 P-Value = 0.000 DF = 18 Both use Pooled StDev = 0.

We see (from the very small P-value) that the difference between the two groups is very highly

significant. This is what we guessed would be the case from looking at the dotplots above. Either the

Fresh samples were manufactured to have a higher potency or the potency of the Stored samples

deteriorated with a year of storage.

The one-factor or one-way ANOVA design (also sometimes called the "completely randomized

design") is a generalization of the two-sided, pooled two-sample t test that can handle more than two

groups. Thus, when it is applied to only two groups, its result should agree with that of the t test.

STAT ➯➯➯➯ ANOVA (^) ➯➯➯➯ Oneway MTB > oneway c1 c2 (Alternatively: MTB > onew 'Potency' 'Group')

One-way ANOVA: Potency versus Group

Source DF SS MS F P Group 1 1.4580 1.4580 17.95 0. Error 18 1.4620 0. Total 19 2.

S = 0.2850 R-Sq = 49.93% R-Sq(adj) = 47.15%

Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----+---------+---------+---------+----- 1 10 10.370 0.323 (-------------) 2 10 9.830 0.241 (-------------) ----+---------+---------+---------+----- 9.75 10.00 10.25 10.

Pooled StDev = 0.

Note: Releases 13 and earlier omit some of the information shown in this Release 14 printout.

'Group'). Use of single quotes (apostrophes) around variable names is optional (unless the first

character of the name is a number or a symbol).

  • With Windows menus: you must select the response variable in one dialog box and the

subscript variables that specify the model in another. (For now, ignore the box for "random"

factors.)

For more complicated designs than the completely randomized design, ANOVA will handle only

balanced situations, i.e., only designs where each treatment (or treatment combination) has the same

number of replications. Because it is programmed to handle such a wide variety of ANOVA designs,

the general ANOVA procedure does not provide confidence intervals.

STAT ➯➯➯➯ ANOVA (^) ➯➯➯➯ Balanced, select 'Potency' as Response, 'Group' as Model MTB > anova Potency = Group

Factor Type Levels Values Group fixed 2 1, 2

Analysis of Variance for Potency

Source DF SS MS F P Group 1 1.4580 1.4580 17.95 0. Error 18 1.4620 0. Total 19 2.

S = 0.284995 R-Sq = 49.93% R-Sq(adj) = 47.15%

Finally, the GLM procedure (stands for "general linear model") has the same syntax as ANOVA. It

requires more intensive computation and more computer memory (perhaps noticeable with large

datasets and complex designs), can handle unbalanced cases, uses a regression approach, and

automatically warns us about "unusual" observations. For more complex designs the two procedures

have somewhat different options and capabilities.

STAT ➯➯➯➯ ANOVA ➯➯➯➯ General linear model MTB > glm Potency = Group

Factor Levels Values Group 2 1 2

Analysis of Variance for Potency

Source DF Seq SS Adj SS Adj MS F P Group 1 1.4580 1.4580 1.4580 17.95 0. Error 18 1.4620 1.4620 0. Total 19 2.

S = 0.284995 R-Sq = 49.93% R-Sq(adj) = 47.15%

Unusual Observations for Potency

Obs. Potency Fit Stdev.Fit Residual St.Resid 5 9.8000 10.3700 0.0901 -0.5700 -2.11R

R denotes an obs. with a large st. resid.

Technical note: Because Group and Error correspond to orthogonal subspaces of the

20-dimensional vector space of observations, the Sequential and Adjusted Sums of

Squares are identical for our data.

Problems:

1.4.1. The GLM procedure indicates that observation #5 is unusual. Minitab's criterion for calling an

observation unusual is based on Studentized residuals of absolute value greater than 2. So this

observation with its value of -2.11 is borderline. (We will not go into the computations involved in

finding Studentized residuals. Very roughly, the idea is that this observation is relatively far from the

mean of the rest of the observations in its group.)

In this ANOVA, the (ordinary) residual of an observation is its difference from its group means.

Using menus, in the one-way ANOVA procedure select the option to store residuals. Verify the

values of the residuals for observations #1, #5, and #11 of the stacked data by hand. Make a box plot

of the residuals. Does it indicate any outliers?

1.4.2. Use the menu path STAT ➯➯➯➯ Basic statistics ➯➯➯➯ Normality test to test the null

hypothesis that the residuals fit a normal distribution (against the alternative that they are not

normal). In the resulting normal probability plot, normal residuals should nearly fit a straight line. Do

ours? What is the P-value of the Anderson-Darling test of normality?

1.4.3. Test the hypothesis that the two groups come from populations with equal variances against

the two-sided alternative. Use the cdf command to find the P-value of this test. (The Fmax-test for t

treatment groups is equivalent to the F test if t = 2. Verify this for the Potency data. Tables of the

Fmax-distribution are available in Ott/Longnecker, and in some other texts. )

1.5. Nonparametric Alternatives

Here we mention several nonparametric tests. You should read the descriptions of them in your

text. In Windows, all menu paths for Minitab's implementations of these tests begin with

STAT > Nonparametric.

  • The nonparametric alternative to the two-sample t test is the Mann-Whitney-Wilcoxon test

(command mann). It works only for unstacked data.

  • Both of the nonparametric alternatives to the general one-way ANOVA are programmed to be

used with stacked data: the Mood test (Minitab command mood) and the Kruskal-Wallis test

(Minitab command kruskal). The Kruskal-Wallis test is a generalization of the Mann-

Whitney-Wilcoxon test in the same sense that the one-way ANOVA is a generalization of a

pooled two-sample t test.

Unlike the t test and ANOVA, none of these nonparametric tests assume normal data. They all test

null hypotheses about equal population medians^ (rather than means).

Like their normal-theory counterparts, these nonparametric tests assume that:

  • The data are random samples from their respective populations,
  • The data for different levels (e.g., Fresh and Stored groups) are independent of one another,
  • The population dispersions are equal. For the normal tests, the specific form of the "equal

dispersion" assumption is that variances are equal. For the nonparametric tests, it is that all

population distributions are of the same shape, differing (if at all) only by a translation that

shifts the entire distribution along with the value of the median.