



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concepts of correlation and linear regression, including the Pearson correlation coefficient, the calculation of sample covariance and correlation, and the use of linear regression for prediction. The text also covers the importance of understanding the difference between correlation and causation, and the potential influence of lurking variables on observed relationships.
Typology: Schemes and Mind Maps
1 / 7
This page cannot be seen from the preview
Don't miss anything!
Lecture 5: Correlation and Linear Regression
3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables.
See Figure 3 on page 105. Given a bivariate data sat of size n,
(x 1 , y 1 ), (x 2 , y 2 ),... , (xn, yn),
the sample covariance sx,y is defined by
sx,y = (^) n −^1
∑^ n i=
(xi − x)(yi − y).
Note that if xi = yi for all i = 1,... , n, then sx,y = s^2 x. The sample correlation coefficient r is defined by
r = (^) ssxx,y sy ,
where sx is the sample standard deviation of x 1 ,... , xn, i.e.
sx =
√∑n i=1(xi^ −^ x)^2 n − 1. To simplify calculation, we often use the following alternative formula: r = √SSx,y x,x
y,y
where Sx,y =
∑^ n i=
xiyi − (^
∑n i=1 xi)(^
∑n i=1 yi) n , Sx,x =
∑^ n i=
x^2 i − (^
∑n i=1 xi)^2 n and Sy,y =
∑^ n i=
y i^2 − (^
∑n i=1 yi)^2 n.
3.6. Prediction: Linear Regression Objective: Assume two variables x and y are related: when x changes, the value of y also changes. Given a data set
(x 1 , y 1 ), (x 2 , y 2 ),... , (xn, yn)
and a value xn+1, can we predict the value of yn+1. In this context, x is called the input variable or predictor, and y is called the output variable or response. Examples:
Method: Linear regression (fitting a straight line to the data). Question: Why do we only consider linear relationships? (Remember that correlation measures the strength and direc- tion of the linear association between variables.)
Recall: A straight line is determined by two constants: its intercept and slope. In its equation
y = β 1 x + β 0 ,
β 0 is the intercept of this line with the y-axis and β 1 represents the slope of the line. Finding the “best-fitting” line
iii. The fitted regression line is ŷ = β̂ 1 x + β̂ 0.
Predicted values For a given value of the x-variable, we compute the predicted value by plugging the value into the least squares line equation.
Example 7. See page 117.
Example 8. Exercise 3.44.