Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Residuals in Simple Linear Regression, Study notes of Statistics

The concept of residuals in simple linear regression (SLR). Residuals are the differences between observed and predicted values. examples and instructions on how to calculate residuals, interpret residual plots, and identify outliers or influential points. It also discusses the importance of residuals in assessing the appropriateness of the SLR model.

What you will learn

  • What is the definition of a residual in simple linear regression?
  • What can you learn from residual plots in simple linear regression?
  • How do you calculate residuals in simple linear regression?

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

juhy
juhy 🇺🇸

4.3

(6)

246 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Section 5.4 - Residuals 1
Section 5.4
Residuals
A residual value is the difference between an actual observed y value and the corresponding
predicted y value,
y. Residuals are just errors.
Residual error = observed value – predicted value
Example 1: A least-squares regression line was fitted to the weights (in pounds) versus age (in
months) of a group of many young children. The equation of the line is
ˆ
y16.6 0.65t
,
where
ˆ
y
is the predicted weight and t is the age of the child. A 20-month old child in this group
has an actual weight of 25 pounds. What is the residual weight, in pounds, for this child?
The plot of the residual values against the x values can tell us a lot about our LSRL model. Plots
of residuals may display patterns that would give some idea about the appropriateness of the
model. The sum of the residuals will always be zero, so they’ll always be centered about the x-
axis.
If the functional form of the regression model is incorrect, the residual plots
constructed by using the model will often display a pattern. The pattern can then be
used to propose a more appropriate model.
When a residual plot shows no pattern, it indicates that the proposed model is a
reasonable fit to a set of data.
Figure 1 shows a horn-shaped pattern (linear model is not a reasonable fit for the data). Figure 2
shows a quadratic pattern (linear model is not a reasonable fit for the data). Figure 3 has no
pattern (linear model is a reasonable fit for the data).
Figure 1 Figure 2 Figure 3
R command: resid( )
pf3
pf4
pf5

Partial preview of the text

Download Understanding Residuals in Simple Linear Regression and more Study notes Statistics in PDF only on Docsity!

Section 5. Residuals

A residual value is the difference between an actual observed y value and the corresponding

predicted y value,

y. Residuals are just errors.

Residual error = observed value – predicted value

Example 1: A least-squares regression line was fitted to the weights (in pounds) versus age (in

months) of a group of many young children. The equation of the line is y ˆ  16.6  0.65 t ,

where y ˆ is the predicted weight and t is the age of the child. A 20-month old child in this group

has an actual weight of 25 pounds. What is the residual weight, in pounds, for this child?

The plot of the residual values against the x values can tell us a lot about our LSRL model. Plots of residuals may display patterns that would give some idea about the appropriateness of the model. The sum of the residuals will always be zero, so they’ll always be centered about the x- axis.

 If the functional form of the regression model is incorrect , the residual plots constructed by using the model will often display a pattern. The pattern can then be used to propose a more appropriate model.

 When a residual plot shows no pattern , it indicates that the proposed model is a reasonable fit to a set of data.

Figure 1 shows a horn-shaped pattern (linear model is not a reasonable fit for the data). Figure 2 shows a quadratic pattern (linear model is not a reasonable fit for the data). Figure 3 has no pattern (linear model is a reasonable fit for the data).

Figure 1 Figure 2 Figure 3

R command: resid( )

Example 2: The following data was collected comparing score on a measure of test anxiety and exam score. Measure of test anxiety 23 14 14 0 7 20 20 15 21 Exam score 43 59 48 77 50 52 46 51 51

a. Construct a scatterplot. Commands: anxiety=c(23,14,14,0,7,20,20,15,21) score=c(43,59,48,77,50,52,46,51,51) plot(anxiety,score,cex=2,pch=16)

Result:

b. Find the LSRL and fit it to the scatter plot. Commands: Results:

LSRL:

An outlier is a value that is well separated from the rest of the data set. An outlier will have a large absolute residual value.

An observation that causes the values of the slope and the intercept in the line of best fit to be considerably different from what they would be if the observation were removed from the data set is said to be influential. When the influential is removed, it makes your LSRL look better (fits the data better).

Example 3: Johnny keeps track of his best swimming times for the 50 meter freestyle from each summer swim team season. Here is his data: Age(years) 9 10 11 12 13 14 15 16 Time (sec) 34.8 34.2 32.9 29.1 28.4 22.4 25.2 24.

a. Construct a scatterplot. Commands: age=c(9,10,11,12,13,14,15,16) time=c(34.8,34.2,32.9,29.1,28.4,22.4,25.2,24.9) plot(age,time,cex=2,pch=16)

Result:

b. Find the LSRL and fit it to the scatter plot. Commands: Results:

LSRL:

c. Find r and r^2. Commands: Answers:

d. Construct a residual plot then determine if the LSRL is a good model for his data. Command:

Result:

Commands:

Results:

e. Is there an influential point (i.e. a point that is an outlier and has a significant impact on the line of best fit)? If so, identify it, and remove it from your data. i. Construct a scatterplot. Commands: age=c(9,10,11,12,13,15,16) time=c(34.8,34.2,32.9,29.1,28.4,25.2,24.9) plot(age,time,cex=2,pch=16)

iv. Construct a residual plot then determine if the LSRL is a good model for his data. Command:

Result:

Commands:

Results:

There are many possible justifications for removing the point (14, 22.4) from the data that Johnny collected. The most likely reasons are suspicion that the data point was collected incorrectly or perhaps outside factors, such as the length of the pool being incorrectly measured or a defect in the timer used.