Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Correlation and Linear Regression: Measuring Linear Relationships and Making Predictions, Schemes and Mind Maps of Statistics

The concepts of correlation and linear regression, including the Pearson correlation coefficient, the calculation of sample covariance and correlation, and the use of linear regression for prediction. The text also covers the importance of understanding the difference between correlation and causation, and the potential influence of lurking variables on observed relationships.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/27/2022

anandit
anandit 🇺🇸

4.8

(19)

255 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 5: Correlation and Linear Regression
3.5. (Pearson) correlation coefficient
The correlation coefficient measures the strength of the linear
relationship between two variables.
The correlation is always between 1 and 1.
Points that fall on a straight line with positive slope have
a correlation of 1.
Points that fall on a straight line with negative slope have
a correlation of 1.
Points that are not linearly related have a correlation of 0.
The farther the correlation is from 0, the stronger the linear
relationship.
The correlation does not change if we change units of mea-
surement.
See Figure 3 on page 105.
Given a bivariate data sat of size n,
(x1, y1),(x2, y2), . . . , (xn, yn),
pf3
pf4
pf5

Partial preview of the text

Download Correlation and Linear Regression: Measuring Linear Relationships and Making Predictions and more Schemes and Mind Maps Statistics in PDF only on Docsity!

Lecture 5: Correlation and Linear Regression

3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables.

  • The correlation is always between −1 and 1.
  • Points that fall on a straight line with positive slope have a correlation of 1.
  • Points that fall on a straight line with negative slope have a correlation of −1.
  • Points that are not linearly related have a correlation of 0.
  • The farther the correlation is from 0, the stronger the linear relationship.
  • The correlation does not change if we change units of mea- surement.

See Figure 3 on page 105. Given a bivariate data sat of size n,

(x 1 , y 1 ), (x 2 , y 2 ),... , (xn, yn),

the sample covariance sx,y is defined by

sx,y = (^) n −^1

∑^ n i=

(xi − x)(yi − y).

Note that if xi = yi for all i = 1,... , n, then sx,y = s^2 x. The sample correlation coefficient r is defined by

r = (^) ssxx,y sy ,

where sx is the sample standard deviation of x 1 ,... , xn, i.e.

sx =

√∑n i=1(xi^ −^ x)^2 n − 1. To simplify calculation, we often use the following alternative formula: r = √SSx,y x,x

√S

y,y

where Sx,y =

∑^ n i=

xiyi − (^

∑n i=1 xi)(^

∑n i=1 yi) n , Sx,x =

∑^ n i=

x^2 i − (^

∑n i=1 xi)^2 n and Sy,y =

∑^ n i=

y i^2 − (^

∑n i=1 yi)^2 n.

3.6. Prediction: Linear Regression Objective: Assume two variables x and y are related: when x changes, the value of y also changes. Given a data set

(x 1 , y 1 ), (x 2 , y 2 ),... , (xn, yn)

and a value xn+1, can we predict the value of yn+1. In this context, x is called the input variable or predictor, and y is called the output variable or response. Examples:

  • Having known the price change history of IBM stock, can we predict its price for tomorrow?
  • Based on your first quiz, predict you final score.
  • Survey consumers’ need for certain product, make a recom- mendation for the number of items to be produced.

Method: Linear regression (fitting a straight line to the data). Question: Why do we only consider linear relationships? (Remember that correlation measures the strength and direc- tion of the linear association between variables.)

  • Linear relationships are easy to understand and analyze.
  • Linear relationships are common.
  • Variables with nonlinear relationships can sometimes be transformed so that the relationships are linear. (See Lab 4 for an example.)
  • Nonlinear relationships can sometimes be closely approxi- mated by linear relationships.

Recall: A straight line is determined by two constants: its intercept and slope. In its equation

y = β 1 x + β 0 ,

β 0 is the intercept of this line with the y-axis and β 1 represents the slope of the line. Finding the “best-fitting” line

  • Idea: Draw a line that seems to fit well and then find its equation.
  • Problems:

iii. The fitted regression line is ŷ = β̂ 1 x + β̂ 0.

Predicted values For a given value of the x-variable, we compute the predicted value by plugging the value into the least squares line equation.

Example 7. See page 117.

Example 8. Exercise 3.44.