



















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Use the ANOVA procedure to test if the slope m is zero at α = 0.05, compare test statistic with critical value; also, find r2. (a) Statement.
Typology: Exams
1 / 27
This page cannot be seen from the preview
Don't miss anything!
Section 5. Sums of Squares and ANOVA (LECTURE NOTES 13) 255
We look at an alternative test, the analysis of variance (ANOVA) test for the slope parameter, H 0 : m = 0, of the simple linear model,
Y = b + mX + ,
where, in particular, is N (0, σ^2 ), where the ANOVA table is
Source Sum Of Squares Degrees of Freedom Mean Squares Regression SSReg =
(ˆyi − y)^2 1 MSReg = SS 1 Reg Residual SSRes =
(yi − yˆi)^2 n - 2 MSRes = SS n−Res 2 Total SSTot =
(yi − y)^2 n - 1
where
f =
MSReg MSRes
with corresponding critical value fα(1, n − 2). Related to this, the average of the y
total deviation
unexplained deviation
explained deviation
Figure 6.13: Types of deviation
variable, ¯y, is a kind of baseline and since
(y − y¯) ︸ ︷︷ ︸ total deviation
= (ˆy − y¯) ︸ ︷︷ ︸ explained deviation
then taking sum of squares over all data points,
∑ (y − y¯)^2 ︸ ︷︷ ︸ total variation
(ˆy − y¯)^2 ︸ ︷︷ ︸ explained variation
(y − yˆ)^2 ︸ ︷︷ ︸ unexplained variation
256 Chapter 6. Simple Regression (LECTURE NOTES 13)
and so
r^2 =
(ˆy − y¯)^2 ∑ (y − y¯)^2
SSTot − SSRes SSTot
SSReg SSTot
explained variation total variation
the coefficient of determination, is a measure of the proportion of the total variation in the y-values from ¯y explained by the regression equation.
Exercise 6.5 (Sums of Squares and ANOVA)
illumination, x 1 2 3 4 5 6 7 8 9 10 ability to read, y 70 70 75 88 91 94 100 92 90 85
Use the ANOVA procedure to test if the slope m is zero at α = 0.05, compare test statistic with critical value; also, find r^2.
(a) Statement. i. H 0 : m = 0 versus H 1 : m > 0. ii. H 0 : m = 0 versus H 1 : m < 0. iii. H 0 : m = 0 versus H 1 : m 6 = 0. (b) Test. the ANOVA table is given by,
Source Sum Of Squares Degrees of Freedom Mean Squares Regression 482.4 1 482. Residual 490.1 8 61. Total 972.5 9 and so the test statistic is
f =
MSReg MSRes
(i) 6. 88 (ii) 7. 88 (iii) 8. 88. and the critical value at α = 0.05, with 1 and 8 df, is (i) 5. 32 (ii) 6. 32 (ii) 7. 32 brightness <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) reading.ability <- c(70, 70, 75, 88, 91, 94, 100, 92, 90, 85) linear.regression.ANOVA(brightness, reading.ability, 0.05) SS df MS F Regression 482.427272727273 1 482.427272727273 7. Residual 490.072727272727 8 61. Total 972.5 9 intercept slope r^2 F crit value F test stat p value 72.20000 2.41818 0.49607 5.31766 7.87519 0.
258 Chapter 6. Simple Regression (LECTURE NOTES 13)
(c) Conclusion. Since p–value, 0.022, is smaller than level of significance, 0.05, we (i) fail to reject (ii) reject null hypothesis the slope m is zero. (d) Comment. Conclusions reached here using F –distribution with the ANOVA procedure are (i) the same as (ii) different from the con- clusions reached previously using the t–distribution.
10 mg 20 mg 30 mg 5.90 5.51 5. 5.92 5.50 5. 5.91 5.50 4. 5.89 5.49 4. 5.88 5.50 5. x¯ 1 ≈ 5. 90 ¯x 2 ≈ 5. 50 ¯x 3 ≈ 5. 00
Use the ANOVA procedure to test if the slope m is zero at α = 0.05, compare test statistic with critical value; also, find r^2.
(a) Statement. i. H 0 : m = 0 versus H 1 : m > 0. ii. H 0 : m = 0 versus H 1 : m < 0. iii. H 0 : m = 0 versus H 1 : m 6 = 0. (b) Test. the ANOVA table is given by,
Source Sum Of Squares Degrees of Freedom Mean Squares Regression 2.025 1 2. Residual 0.0105 13 0. Total 2.0355 14 and so the test statistic is
f =
MSReg MSRes
(i) 2299. 2 (ii) 2399. 2 (iii) 2499. 2. and the critical value at α = 0.05, with 1 and 13 df, is (i) 4. 67 (ii) 6. 32 (ii) 7. 32 dosage <- c(10, 10, 10, 10, 10, 20, 20, 20, 20, 20, 30, 30, 30, 30, 30) response <- c(5.90, 5.92, 5.91, 5.89, 5.88, 5.51, 5.50, 5.50, 5.49, 5.50, 5.01, 5.00, 4.99, 4.98, 5.02) linear.regression.ANOVA(dosage, response, 0.05)
Section 5. Sums of Squares and ANOVA (LECTURE NOTES 13) 259
SS df MS F Regression 2.025 1 2.025 2499. Residual 0.0105333333333334 13 0. Total 2.03553333333333 14 intercept slope r^2 F crit value F test stat p value 6.367e+00 -4.500e-02 9.948e-01 4.667e+00 2.499e+03 2.220e-
(c) Conclusion. Since test statistic = 2499. 2 > critical value = 4.67, (i) do not reject (ii) reject null H 0 : m = 0. Data indicates population slope (i) equals (ii) does not equal (iii) greater than zero (0). In other words, response (i) is (ii) is not associated with dosage. (d) Coefficient of Determination. r^2 = (i) 0. 09 (ii) 0. 10 (iii) 0. 99 in other words, regression explains (i) 9% (ii) 10% (iii) 99% of the total variation in the scatterplot (e) Comparing ANOVA of linear regression with ANOVA of means. Recall, fifteen different patients, chosen at random, subjected to three different drugs. Test if at least one of the three mean patient responses (notice, all the same as above) to drug is different at α = 0.05. drug 1 drug 2 drug 3 5.90 5.51 5. 5.92 5.50 5. 5.91 5.50 4. 5.89 5.49 4. 5.88 5.50 5. x¯ 1 ≈ 5. 90 x¯ 2 ≈ 5. 50 ¯x 3 ≈ 5. 00 The ANOVA test of means is
The ANOVA of means table is Source Sum Of Squares Degrees of Freedom Mean Squares Treatment 2.033 2 1. Residual 0.0022 12 0. Total 2.0355 14
Section 6. Nonlinear Regression (LECTURE NOTES 13) 261
x 1 2 3 4 5 x^2 1 4 y 73 67 57
Nonlinear function y = 75 − 2 x^2 is linearized by transforming (i) x (i) y axis.
1 2 3 4 5
30
40
50
60
70
nonlinear: y = 75 − 2 x^
x
y
5 10 15 20 25
30
40
50
60
70
linear: y = 75 − 2 x^
x^
y
Figure 6.15: Nonlinear and linear version of y = 75 − 2 x^2
Using the 5 (x, y) data points, regress y on x^2 (rather than x), and “discover” intercept (i) − 2 (i) 75 , slope (i) − 2 (i) 75 and r^2 = (i) 0 (i) 1 because these points (i) perfectly (ii) imperfectly fit linearized model y = 75 − 2 x^2. Typically, linear models (i) do (ii) do not perfectly fit sampled (x, y) data. x <- c(1, 2, 3, 4, 5) y <- c(73, 67, 57, 43, 25) linear.regression.ANOVA(x^2, y, 0.05)
SS df MS F Regression 1496 1 1496 Inf Residual 0 3 0 Total 1496 4 intercept slope r^2 F crit value F test stat p value 75.00 -2.00 1.00 10.13 Inf 0.
illumination, x 1 2 3 4 5 6 7 8 9 10 ability to read, y 70 70 75 88 91 94 100 92 90 85
Apply various nonlinear models to the data, predict reading ability at x = 7.5, measure fit of each model by calculating r^2 of linearized versions of the nonlinear regressions.
262 Chapter 6. Simple Regression (LECTURE NOTES 13)
brightness <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) reading.ability <- c(70, 70, 75, 88, 91, 94, 100, 92, 90, 85)
(a) Original linear model. Least-squares linear model is
Figure 6.16: Linear model, no transformation
i. y = 68.091 + 11.526 ln x ii. y = 72.2 + 2. 42 x iii. y = 68. 091 − 11 .526 ln x and, at x = 7.5 for example, ˆy = 72.2 + 2.42(7.5) ≈ (i) 90. 17 (ii) 91. 31 (iii) 91. 34 (iv) 92. 55 but because r^2 = (i) 0. 50 (ii) 0. 52 (iii) 0. 66 (iv) 0. 69 , only 50% of variation is explained by linear regression and so prediction at x = 7.5 is (i) poor (ii) good. linear.regression.predict(brightness, reading.ability, x.zero=7.5) plot(d,pch=16,col="red",xlab="brightness",ylab="reading.ability", main="y = 72.2 + 2.42 x, r^2 = 0.50") # original, linear model x0 <- seq(1,10,0.05) y0 <- 72.2 + 2.42 * x points(x0,y0,pch=16,cex=0.2,col="black") r2 <- cor(x,y)^2; r intercept slope x y.predict(x) 72.200000 2.418182 7.500000 90.
r2 <- cor(x,y)^2; r [1] 0.
264 Chapter 6. Simple Regression (LECTURE NOTES 13)
(c) Nonlinear exponential model.
7 x
Figure 6.18: Exponential transformation
nonlinear.regression(brightness, reading.ability, 1, "exponential") transformation trans.intercept, a intercept, a slope, b r^ "exponential" "4.2767375112164" "72.0051404219156" "0.0299638959744328" "0.518078387957388" To fit the nonlinear exponential model
y = aebx
to the data, first convert to a linear equation:
ln y = ln a + bx, take ln on both sides
then take a least-squares approximation of this linear transformation, i. y = 68.091 + 11.526 ln x ii. ln y = 4.276 + 0. 030 x iii. ln y = 4.226 + 0.143 ln x iv. ln
101 −y y
= − 0. 961 − 0. 191 x
where r^2 = (i) 0. 27 (ii) 0. 52 (iii) 0. 66 (iv) 0. 69
whereas the exponential regression itself is i. y = (^) 1+e− 0.^101961 − 0. 191 x ii. e
y
Section 6. Nonlinear Regression (LECTURE NOTES 13) 265
(d) Nonlinear power model.
Figure 6.19: Power transformation
nonlinear.regression(brightness, reading.ability, 1, "power") transformation trans.intercept, a intercept, a slope, b r^ "power" "4.22624256172365" "68.4595158951469" "0.142538729202824" "0.687209998444701" To fit the nonlinear power model
y = axb
to the data, first convert to a linear equation:
ln y = ln a + b ln x, take ln on both sides
then take a least-squares approximation of this linear transformation, i. y = 68.091 + 11.526 ln x ii. ln y = 4.276 + 0. 030 x iii. ln y = 4.226 + 0.143 ln x iv. ln
101 −y y
= − 0. 961 − 0. 191 x
where r^2 = (i) 0. 27 (ii) 0. 52 (iii) 0. 66 (iv) 0. 69
whereas the power regression itself is i. y = (^) 1+e− 0.^101961 − 0. 191 x ii. e
y
Section 6. Nonlinear Regression (LECTURE NOTES 13) 267
ii. e
y
whereas the worst-fitting regression is (i) linear (ii) logarithmic (iii) exponential (iv) power (v) logistic
Figure 6.21: Comparing nonlinear transformations
(g) Why do nonlinear model involve natural log and exponential functions? The nonlinear models given here use the natural log, “ln”, or expo- nential, “exp”, because not only do they “bend” the regression to fit the data better but also the important normal probability distribution, f (x) = (^) σ√^12 π e−(1/2)[(x−μ)/σ]
2 is defined with the exponential function. Con- sequently, it becomes easier to perform inference on the nonlinear regres- sion which often requires normal assumptions. (i) True (ii) False
268 Chapter 6. Simple Regression (LECTURE NOTES 13)
brightness, x 9 7 11 16 21 19 23 29 31 33 ability to read, y 0.1 0.1 0.1 0.1 0.1 0.9 0.9 0.9 0.9 0.
Figure 6.22: Logistic transformation for binary data
x <- c(9, 7, 11, 16, 21, 19, 23, 29, 31, 33) y <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9) nonlinear.regression(x, y, 1, "logistic")
transformation trans.intercept, a intercept, a slope, b r^ "logistic" "4.03753232581395" "4.03753232581395" "-0.202891071648942" "0.655611913122643"
Least-squares approximation of linear transformation of logistic model
(a) y = 68.091 + 11.526 ln x (b) ln y = 4.226 + 0. 030 x (c) ln y = 4.226 + 0.143 ln x
(d) ln
1 −y y
= 4. 038 − 0. 203 x
where r^2 = (i) 0. 27 (ii) 0. 52 (iii) 0. 66 (iv) 0. 69
whereas the logistic regression itself is
(a) y = (^) 1+e 4. 0381 − 0. 203 x
(b) e
y
270 Chapter 6. Simple Regression (LECTURE NOTES 13)
critical value F (^) α∗;k,n−k− 1 is associated with given confidence level and (k, n − k − 1) degrees of freedom and critical value t∗ α 2 ,n−k−^1 is associated with given confidence level
and n − k − 1 degrees of freedom.
Exercise 6.7 (Multiple Regression)
brightness, x 1 9 7 11 16 21 19 23 29 31 33 noise, x 2 100 93 85 76 61 58 46 32 24 12 ability to read, y 40 50 64 73 86 97 104 113 123 130
brightness <- c(9, 7, 11, 16, 21, 19, 23, 29, 31, 33) noise <- c(100, 93, 85, 76, 61, 58, 46, 32, 24, 12) reading.ability <- c(40, 50, 64, 73, 86, 97, 104, 113, 123, 130) d <- data.frame(brightness, noise, reading.ability)
(a) Linear regression reading ability versus brightness (alone) is i. ˆy = 23.5 + 3. 24 x 1 ii. ˆy = 147. 4 − 1. 01 x 2 iii. ˆy = 164. 0 − 0. 44 x 1 − 1. 15 x 2 Reading ability increases 3.24 units per unit increase brightness. lm(reading.ability ~ brightness,d) (Intercept) brightness 23.53 3. Linear regression of reading ability versus noise (alone) is i. ˆy = 23.5 + 3. 24 x 1 ii. ˆy = 147. 4 − 1. 01 x 2 iii. ˆy = 164. 0 − 0. 44 x 1 − 1. 15 x 2 On average, reading ability decreases 1.01 units per unit increase noise. lm(reading.ability ~ noise,d) (Intercept) noise 147.392 -1. Figure shows two (simple) linear regressions, each with (i) one (ii) two (iii) three predictor(s). par(mfrow=c(1,2)) plot(brightness,reading.ability, pch=16,col="red",xlab="Brightness, x1",ylab="Reading Ability, y") model.reading <- lm(reading.ability~brightness); model.reading; abline(model.reading,col="black") plot(noise, reading.ability, pch=16,col="red",xlab="Noise, x2",ylab="Reading Ability, y") model.reading <- lm(reading.ability~noise); model.reading; abline(model.reading,col="black") par(mfrow=c(1,1))
(b) The multiple linear regression is given by,
Section 7. Multiple Regression (LECTURE NOTES 13) 271
10 15 20 25 30
40
60
80
100
120
Brightness, x
Reading Ability, y
20 40 60 80 100
40
60
80
100
120
Noise, x
Reading Ability, y
Figure 6.23: Scatter plots and two simple linear regressions
i. ˆy = 23.5 + 3. 24 x 1 ii. ˆy = 147. 4 − 1. 01 x 2 iii. ˆy = 164. 0 − 0. 44 x 1 − 1. 15 x 2 The y–intercept of this line, b, is (i) 164. 0 (ii) − 0. 44 (iii) − 1. 15. The slope in the x 1 direction, ˆm 1 , is (i) 164. 0 (ii) − 0. 44 (iii) − 1. 15. The slope in the x 2 direction, ˆm 2 , is (i) 164. 0 (ii) − 0. 44 (iii) − 1. 15. lm(reading~brightness + noise) Coefficients: (Intercept) brightness noise 164.0466 -0.4416 -1.
brightness (^1)
x 2
x
y
ei
reading ability
noise
y = 164.0 - 0.44x 1 - 1.15x 2
y = 164.0 - 0.44x 1 - 1.15x 2 + e
regression model
regression function ^
residual
Figure 6.24: Scatter plot and multiple regression
Multiple regression has (i) one (ii) two (iii) three predictors. The multiple regression is (i) linear (ii) quadratic in the xi. There are (i) 10 (ii) 20 (iii) 30 data points. One data point is (x 1 , x 2 , yˆ) = (i) (19, 58) (ii) (19, 58 , 97) (iii) (58, 97). Data point (x 1 , x 2 , y) = (19, 58 , 97) means
Section 7. Multiple Regression (LECTURE NOTES 13) 273
(k) If we sampled at random another ten individuals, we would get (i) the same (ii) different scatter plot of points. The data is a example of a (i) sample (ii) population.
brightness, x 1 9 7 11 16 21 19 23 29 31 33 noise, x 2 100 93 85 76 61 58 46 32 24 12 ability to read, y 40 50 64 73 86 97 104 113 123 130
brightness <- c(9, 7, 11, 16, 21, 19, 23, 29, 31, 33) noise <- c(100, 93, 85, 76, 61, 58, 46, 32, 24, 12) reading.ability <- c(40, 50, 64, 73, 86, 97, 104, 113, 123, 130) d <- data.frame(brightness, noise, reading.ability)
(a) Identify all possible models for this data from the following. i. ˆy = ¯y = 88 ii. ˆy = 23.5 + 3. 24 x 1 iii. ˆy = 147. 4 − 1. 01 x 2 iv. ˆy = 164. 0 − 0. 44 x 1 − 1. 15 x 2 lm(reading.ability ~ 1,d) lm(reading.ability ~ brightness,d) lm(reading.ability ~ noise,d) lm(reading.ability ~ brightness + noise,d) lm(formula = reading.ability ~ 1, data = d) Coefficients: (Intercept) 88 lm(formula = reading.ability ~ brightness, data = d) Coefficients: (Intercept) brightness 23.53 3. lm(formula = reading.ability ~ noise, data = d) Coefficients: (Intercept) noise 147.392 -1. lm(formula = reading.ability ~ brightness + noise, data = d) Coefficients: (Intercept) brightness noise 164.0466 -0.4416 -1.
(b) Assess fit of model 1: reading ability regressed on intercept, yˆ = b = ¯y = 88.
A. Is intercept b = ¯y = 88 significant? Is b = ¯y = 88 a better predictor of reading ability than b = 0? Statement.
274 Chapter 6. Simple Regression (LECTURE NOTES 13)
i. H 0 : b = 0 versus H 1 : b > 0 ii. H 0 : b = 0 versus H 1 : b < 0 iii. H 0 : b = 0 versus H 1 : b 6 = 0 Test. Chance |t = 9. 053 | or more, if b = 0, is p–value = 2 · P (t ≥ 9 .053) ≈ (i) 0. 00 (ii) 0. 01 (iii) 0. 11 level of significance α = (i) 0. 01 (ii) 0. 05 (iii) 0. 10. Conclusion. Since p–value = 0. 00 < α = 0.05, (i) do not reject (ii) reject null H 0 : b = 0. data indicates intercept, b = ¯y = 88 (i) smaller than (ii) equals (iii) does not equal zero (0) so, yes, b = ¯y = 88 is significant; that is, it is a better predictor than b = 0 of reading ability.
B. Is residual standard error, se, small? If se is small, the data is close to the model ˆy = b = ¯y = 88. se = (i) 10. 74 (ii) 20. 74 (iii) 30. 74 which is may or may not be “large” (since there is nothing to com- pare this number against) but it turns out to be large and so the data is (i) close to (ii) far away from the model ˆy = ¯y = 88, so this measure indicates the model does not fit the data very well. lm(reading.ability ~ 1,d) # one possible model Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 88.000 9.721 9.053 8.14e-06 ***
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 30.74 on 9 degrees of freedom (c) Model 2: reading ability regressed on brightness only, yˆ = 23.5 + 3. 24 x 1. A. Is intercept b = ¯y = 23. 5 significant? Since p–value = 0. 004 < α = 0.05, (i) do not reject (ii) reject null H 0 : b = 0. data indicates intercept, b = ¯y = 23. 5 (i) smaller than (ii) equals (iii) does not equal zero (0) so, yes, b = ¯y = 23.5 is significant
B. Is slope m 1 = 3. 24 significant? Since p–value = 0. 000 < α = 0.05, (i) do not reject (ii) reject null H 0 : m 1 = 0. data indicates slope m 1 = 3. 24 (i) smaller than (ii) equals (iii) does not equal zero (0) so, yes, m 1 = 3.24 is significant in fact, “more” significant than intercept b because of smaller p-value.
C. Is residual standard error, se, small? se = (i) 10. 74 (ii) 7. 37 (iii) 30. 74