



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of residuals in simple linear regression (SLR). Residuals are the differences between observed and predicted values. examples and instructions on how to calculate residuals, interpret residual plots, and identify outliers or influential points. It also discusses the importance of residuals in assessing the appropriateness of the SLR model.
What you will learn
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!
Section 5. Residuals
A residual value is the difference between an actual observed y value and the corresponding
predicted y value,
y. Residuals are just errors.
Residual error = observed value – predicted value
Example 1: A least-squares regression line was fitted to the weights (in pounds) versus age (in
has an actual weight of 25 pounds. What is the residual weight, in pounds, for this child?
The plot of the residual values against the x values can tell us a lot about our LSRL model. Plots of residuals may display patterns that would give some idea about the appropriateness of the model. The sum of the residuals will always be zero, so they’ll always be centered about the x- axis.
If the functional form of the regression model is incorrect , the residual plots constructed by using the model will often display a pattern. The pattern can then be used to propose a more appropriate model.
When a residual plot shows no pattern , it indicates that the proposed model is a reasonable fit to a set of data.
Figure 1 shows a horn-shaped pattern (linear model is not a reasonable fit for the data). Figure 2 shows a quadratic pattern (linear model is not a reasonable fit for the data). Figure 3 has no pattern (linear model is a reasonable fit for the data).
Figure 1 Figure 2 Figure 3
R command: resid( )
Example 2: The following data was collected comparing score on a measure of test anxiety and exam score. Measure of test anxiety 23 14 14 0 7 20 20 15 21 Exam score 43 59 48 77 50 52 46 51 51
a. Construct a scatterplot. Commands: anxiety=c(23,14,14,0,7,20,20,15,21) score=c(43,59,48,77,50,52,46,51,51) plot(anxiety,score,cex=2,pch=16)
Result:
b. Find the LSRL and fit it to the scatter plot. Commands: Results:
An outlier is a value that is well separated from the rest of the data set. An outlier will have a large absolute residual value.
An observation that causes the values of the slope and the intercept in the line of best fit to be considerably different from what they would be if the observation were removed from the data set is said to be influential. When the influential is removed, it makes your LSRL look better (fits the data better).
Example 3: Johnny keeps track of his best swimming times for the 50 meter freestyle from each summer swim team season. Here is his data: Age(years) 9 10 11 12 13 14 15 16 Time (sec) 34.8 34.2 32.9 29.1 28.4 22.4 25.2 24.
a. Construct a scatterplot. Commands: age=c(9,10,11,12,13,14,15,16) time=c(34.8,34.2,32.9,29.1,28.4,22.4,25.2,24.9) plot(age,time,cex=2,pch=16)
Result:
b. Find the LSRL and fit it to the scatter plot. Commands: Results:
c. Find r and r^2. Commands: Answers:
d. Construct a residual plot then determine if the LSRL is a good model for his data. Command:
Result:
Commands:
Results:
e. Is there an influential point (i.e. a point that is an outlier and has a significant impact on the line of best fit)? If so, identify it, and remove it from your data. i. Construct a scatterplot. Commands: age=c(9,10,11,12,13,15,16) time=c(34.8,34.2,32.9,29.1,28.4,25.2,24.9) plot(age,time,cex=2,pch=16)
iv. Construct a residual plot then determine if the LSRL is a good model for his data. Command:
Result:
Commands:
Results:
There are many possible justifications for removing the point (14, 22.4) from the data that Johnny collected. The most likely reasons are suspicion that the data point was collected incorrectly or perhaps outside factors, such as the length of the pool being incorrectly measured or a defect in the timer used.