

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Class: Statistical Applications; Subject: Mathematics; University: Saint Mary's College; Term: Spring 2009;
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!
CHI SQUARE TESTS for goodness of fit and independence (Chapter 12) The Chi square statistic can be used for tests on distributions — but must be used with frequency counts,[i.e. the number of observations that fall into certain categories]. We use fi to represent the actual frequency for category i (number of observations — in the actual data — that are in categoryi) and ei to represent the expected frequency if H 0 is true (number of observations for category i predicted by H 0 for a sample of this size).
Our test statistic is (in all cases) χ^2 =
I
(fi − ei)^2 ei
OR χ^2 =
i,j
(fij − eij )^2 eij
(Total, over all categories, of (actual minus expected) squared over expected — categories may be based on one variable
Goodness of Fit [One variable — one row of categories]
The issue is to determine whether a particular probability distribution might reasonably describe the population from which the sample was drawn. Our test is always H 0 : The data come from a population with the distribution stated Ha: The data come from a population which does not fit that distribution
The test statistic is given by: sample χ^2 =
I
(fi − ei)^2 ei
with df = #categories− 1 −(number of parameters estimated from data)
In general, the expected frequency for category i is P (X = i) × n (n = sample size) — and is not rounded to a whole number. (P (X = i) comes from the distribution we are testing for) Critical values for the distribution are given in table 3 on p.923 [same as used for inference on σ^2 ] but we are only interested in small areas [columns further to the right]. Decision method: We will reject H 0 and conclude the proposed distribution does not fit if our sample χ^2 > χ^2 α with df = #categories − 1 − (number of parameters estimated from data)
Independence Test and Contingency Tables [Two variables or two populations making a table of categories]
Events A and B are independent if P (A|B) = P (A), [which is equivalent to P (A and B) = P (A)P (B)] Two variables are independent if knowing the value for one does not change the probability distribution for the other. (All events that can be described with one are independent of all events that can be described with the other)
In the contingency table (laying out all the possible combinations of values for the variables — all “contingencies”), independence means that the probability of any cell can be found as the product of marginal probabilities (P (X = A and Y = B) = P (X = A) × P (Y = B)) That is, the probability of column one is the same for every row, probability of column two is the same for every row, etc. and probability of row 1 is the same for every column, etc. Thus the expected count eij for the cell in row i, column j is given by
eij = P (row i) × P (column j) × sample size =
sample size
sample size
× sample size =
sample size
The issue is to determine whether the two variables (determining the rows and columns, respectively) are independent. Test is always H 0 : The two variables are independent Ha: The two variables are not independent
The test statistic is χ^2 =
i,j
(fij − eij )^2 eij
df = (#rows − 1) × (#columns − 1)
Decision method: We will reject H 0 and conclude the variables are not independent if our sample χ^2 > χ^2 α with df = (#rows − 1) × (#columns − 1). That is we reject the null hypothesis only if the test statistic is “big”.
MINITAB: [for contingency table] Enter the observed frequencies in adjacent columns, keeping the entries in order (so you copy the table of observed values). Choose Stat>Tables then choose Chi-Square Test (Table in Worksheet) enter the appropriate columns (containing the table) in the Columns Containing Table box
Equality of proportions
The chi square test for equality of several proportions (which is the extension of the two-sample test on proportions) is
most easily treated as a special case of the test of independence. We have two rows (for “yes” and “no”) and one column for each population. Test is always H 0 : The proportions (of “yes”) are the same in all populations Ha: The proportions are not the same in all populations
Test statistic χ^2 =
i,j
(fij − eij )^2 eij
df = #columns − 1 because #rows = 2.
Examples