Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture Notes on Simple Linear Regression | MATH 241, Study notes of Mathematics

Material Type: Notes; Class: Statistical Applications; Subject: Mathematics; University: Saint Mary's College; Term: Unknown 1990;

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-bj7
koofers-user-bj7 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Simple Linear Regression
Regression expresses a relation used to predict one variable, called the response variable (or “dependent” variable
often called y), from other variables, called predictors (or “independent” variables often called x1, x2, . . . , xk), and
provides us with an equation to make this prediction. The regression equation that we calculate is descriptive of the
sample (like sample mean, like sample standard deviation); we use various inference methods to see the extent to which
we believe the description carries over to the population from which the sample is drawn.
A sample question
This table represent a sample of ten trucks; for each, we have with the age in years and the annual maintenance cost. We
want to find a linear equation, using this information, which most closely describes (“predicts”) the [average] maintenance
cost of a truck, based on its age. [Notice how the language tells us the ‘predictor is age and the ‘response” (“predicted”)
is cost.
Truck age cost
number (years) ($Thousands)
1 1 3.50
2 2 3.70
3 2 4.80
4 2 5.20
5 2 5.90
6 3 5.50
7 4 7.50
8 4 8.00
9 5 7.90
10 5 9.50
For this course, we will focus on linear regression in which we look for the linear equation y=b0+b1x1+b2x2+. . .+bkxk
which best fits our data, and will begin with “simple” linear regression using one predictor.
Thus we will have data in pairs two different variables (a value of the predictor and a value of the response) observed
on the same individual (or sampling unit) and we will be looking for the equation y=b0+b1xwhich best describes the
relation; later we will look at tests to decide whether we have evidence for a similar relation in the population, and at how
to use the relation to make predictions.
The linear regression model (the theory we are using)
Our calculations (and decision of what is “better” or “worse” fit) are based on the following model (assumptions about the
population): There are two random variables Xand Y. For each possible value xof Xthere is a probability distribution
of values of Y(Y|x) which fits the following conditions:
1. The mean of the Y’s for a given x(called E(Y|x) or µY|x) is given by a linear equation E(Y|x) = β0+β1x.
2. For each value of X, the values of Y|xare approximately normally distributed
3. The standard deviations of the variables Y|xare all the same (for all values xof X) (This is the assumption of “ho-
moscedasticity” - notice it’s like the assumption in analysis of variance that all the populations have the same variance)
Another way of saying all of this is that the values of Yare given by y=β0+β1x+where (the “error of prediction”)
is a random variable (representing the “random variation” of individuals) which is normally distributed and independent
of X. [We will come back to these ideas]]
The Regression eaquation
For any linear equation y=b0+b1xeach data point (xi, yi) gives a “predicted value” ˆyi=b0+b1xiand there is a residual
yiˆyiwhich gives the error (difference between the actual value for that point and the prediction for the point). The
“line of best fit in the sense of least squares” or the “regression line for predicting ybased on xor the “OLS [stands for
“ordinary least squares”] line for ybased on x is the ˆy=b0+b1xfor which the total of the squares of the residuals
(y1ˆy1)2+ (y2ˆy2)2+· ·· + (ynˆyn)2[ = (y1(b0+b1x))2+ (y2(b0+b1x2))2+··· + (ynb(b0+b1xn))2] is
smallest. [If our model is correct, this is the method that will most often bring us closest to the “real” population line].
Fortunately, some work with calculus (already done for us by some nice people years ago) gives us the following equations
(the “normal equations”) for b0and b1
Pyi=nb0+b1Pxi
Pxiyi=b0+b1Px2
iwhich we can rewrite [after some clever algebra] as
slope =b1=P(xi¯x)(yi¯y)
P(xi¯x)2=P(xiyi)n¯x¯y
P(xi¯x)2, intercept =b0= ¯yb1¯x
1
pf3

Partial preview of the text

Download Lecture Notes on Simple Linear Regression | MATH 241 and more Study notes Mathematics in PDF only on Docsity!

Simple Linear Regression

Regression expresses a relation used to predict one variable, called the response variable (or “dependent” variable

  • often called y), from other variables, called predictors (or “independent” variables – often called x 1 , x 2 ,... , xk ), and provides us with an equation to make this prediction. The regression equation that we calculate is descriptive of the sample (like sample mean, like sample standard deviation); we use various inference methods to see the extent to which we believe the description carries over to the population from which the sample is drawn.

A sample question This table represent a sample of ten trucks; for each, we have with the age in years and the annual maintenance cost. We want to find a linear equation, using this information, which most closely describes (“predicts”) the [average] maintenance cost of a truck, based on its age. [Notice how the language tells us the ‘predictor” is age and the ‘response” (“predicted”) is cost.

Truck age cost number (years) ($Thousands) 1 1 3. 2 2 3. 3 2 4. 4 2 5. 5 2 5. 6 3 5. 7 4 7. 8 4 8. 9 5 7. 10 5 9. For this course, we will focus on linear regression in which we look for the linear equation y = b 0 +b 1 x 1 +b 2 x 2 +.. .+bkxk which best fits our data, and will begin with “simple” linear regression — using one predictor. Thus we will have data in pairs — two different variables (a value of the predictor and a value of the response) observed on the same individual (or sampling unit) and we will be looking for the equation y = b 0 + b 1 x which best describes the relation; later we will look at tests to decide whether we have evidence for a similar relation in the population, and at how to use the relation to make predictions.

The linear regression model (the theory we are using) Our calculations (and decision of what is “better” or “worse” fit) are based on the following model (assumptions about the population): There are two random variables X and Y. For each possible value x of X there is a probability distribution of values of Y (Y |x) which fits the following conditions:

  1. The mean of the Y ’s for a given x (called E(Y |x) or μY |x) is given by a linear equation E(Y |x) = β 0 + β 1 x.
  2. For each value of X, the values of Y |x are approximately normally distributed
  3. The standard deviations of the variables Y |x are all the same (for all values x of X) (This is the assumption of “ho- moscedasticity” - notice it’s like the assumption in analysis of variance that all the populations have the same variance) Another way of saying all of this is that the values of Y are given by y = β 0 + β 1 x +  where  (the “error of prediction”) is a random variable (representing the “random variation” of individuals) which is normally distributed and independent of X. [We will come back to these ideas]]

The Regression eaquation For any linear equation y = b 0 + b 1 x each data point (xi, yi) gives a “predicted value” ˆyi = b 0 + b 1 xi and there is a residual yi − yˆi which gives the error (difference between the actual value for that point and the prediction for the point). The “line of best fit in the sense of least squares” or the “regression line for predicting y based on x or the “OLS [stands for “ordinary least squares”] line for y based on x” is the ˆy = b 0 + b 1 x for which the total of the squares of the residuals (y 1 − yˆ 1 )^2 + (y 2 − yˆ 2 )^2 + · · · + (yn − yˆn)^2 [ = (y 1 − (b 0 + b 1 x))^2 + (y 2 − (b 0 + b 1 x 2 ))^2 + · · · + (yn − b(b 0 + b 1 xn))^2 ] is smallest. [If our model is correct, this is the method that will most often bring us closest to the “real” population line]. Fortunately, some work with calculus (already done for us by some nice people years ago) gives us the following equations (the “normal equations”) for∑ b 0 and b 1 yi = nb 0 + b 1

∑ xi xiyi = b 0 + b 1

x^2 i which we can rewrite [after some clever algebra] as

slope = b 1 =

(xi − ¯x)(yi − ¯y) ∑ (xi − x¯)^2

(xiyi) − nx¯¯y ∑ (xi − ¯x)^2

, intercept = b 0 = ¯y − b 1 ¯x

We would not calculate these by hand but would use a statistics package (in Minitab: Stat>REGRESSION>REGRESSION (See the online Minitab handbook); “Response” and “Predictor” are columns containing values of those variables (y, x repectively)) or a calculator (Stat>Calc>Linreg(a+bx) or Stat>Calc>Linreg(ax +b) – it doesn’t matter– on the TI-8x family - List with predictor first, List with response second (See the online “Using your calculator for statistics” pages)) The intercept formula tells us that the regression line always goes through the point (¯x, y¯), which seems reasonable and is useful for a “cheap check”. The slope formula can be rewritten (once we know about the correlation coefficient r)

in the form b 1 = r

sy sx

  • which fits nicely with the idea of slope as change in y over change in x. It is a fact (which we shall

not prove) that if our model is correct, b 0 and b 1 are unbiased consistent estimators of β 0 and β 1 (averaging the b 0 values obtained from all possible samples of size n gives β 0 ; similarly for b 1 ). This will be very useful when we want to carry out tests and make estimates for the population values of slope and intercept.

Coefficient of determination r^2 : Looking at the sum of squares (of deviations from the mean) for y, (that is SSy =

(yi − y¯)^2 ) we have the same division into pieces that we used for ANOVA - but SSR - the sum of squares for the regression [the sum of squares of the predictions] takes the place of SSTR ( the sum of squares between groups - the “groups” are defined by the x values) and SSE is the sum of the squares of the residuals (the residuals are the “error of prediction” values) So SST = SSE + SSR, that is∑ (yi − y¯)^2 =

(yi − ˆyi)^2 +

(ˆyi − y¯)^2 The coefficient of determination measures the proportion of SST (variation in y) that correpsonds to (is explained by) the relation to x:

r^2 =

SSR

SST

SSE

SST

The value of r^2 tells us how well the data (from the sample) fits a linear model. We need the appropriate hypothesis test structure (described below under “Testing the regression coefficients”) to decide whether this is good enough to be convincing about the population.

Correlation:

The [sample] correlation coefficient r is given by r = (sign of b 0 )

r^2 (computation from data: r =

n − 1

∑ (^ xi − ¯x sx

yi − y¯ sy

It measures the extent to which the data points follow a straight line and will have a value ranging from −1 (perfect match to a line with negative slope) through 0 (do not follow a line – though might follow some other curve nicely) to + (perfect match to a line with positive slope). Like the variance, it is hard to interpret – the coefficient of determination r^2 has a more concrete meaning [but loses the information about sign]. The correlation in the population is called ρ.

Standard error of the residuals (“σ” for the regression): σ is estimated (from the sample) by:

se =

SSE

n − 2

(yi − yˆi)^2 n − 2

(yi − (b 0 + b 1 xi))^2 n − 2 This is the value Minitab and your text call “s”. It is the [sample] standard deviation of the residuals; it is also MSE. Note the “n − 2” denominator – there are n − 2 (not n − 1) degrees of freedom because we are using both b 0 and b 1 in calculating the residuals.

Repeat of the assumptions of linear regression [will matter for Tests, Confidence Intervals, etc.]:

  1. The y values for each x are normally distributed.
  2. The point (x, μY |x) lies on the ”true” regression line for each value x of X.
  3. The variance of Y |x is the same (σ) for all x’s.
  4. Successive observations are independent.

Testing the regression coefficients – is there evidence that that there is a linear relation? If there is no linear relation between X and Y, then the [population] regression coefficient β 1 is 0. Thus, to decide “Is there a linear relation” our test is

H 0 : β 1 = 0 [no linear relationship between variables – values of X are not useful for linear prediction of values of Y] Ha : β 1 6 = 0 [some linear relation between X and Y]