Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

The Usefulness of the R2 Statistic, Lecture notes of Statistics

Lakeland Community College Statistics

Trend Factors, The Meaning of R-Squared Through simple graphical examples, Barclay showed that the coeffkient of variation (R') is, by itself, ...

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

shachi_984a 🇺🇸

4.6

(15)

222 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

The UsqUrzess of the R2 Statistic

by Ross Fonticella, ACAS

Partial preview of the text

Download The Usefulness of the R2 Statistic and more Lecture notes Statistics in PDF only on Docsity!

The UsqUrzess of the R2 Statistic

by Ross Fonticella, ACAS

The Usefulnessof the R’ Statistic

Introduction, Almost every Actuarial Department uses least square regression to tit frequency, severity, or pure premium data to determine loss trends Many actuaries use the R2 statistic to measurethe goodness-of-fit of the trend. Actually, the R’ statistic measureshow significantly the slope of the fitted line differs from zero, which is not the same as a good fit

In the Fall, 1991 Casualty Actuarial Society Forum, D Lee Barclay wrote A Statistical Note On Trend Factors, The Meaning of R-Squared Through simple graphical examples, Barclay showed that the coeffkient of variation (R’) is, by itself, a poor measure of goodness-of-fit. Barclay’s numerical examples provide additional support for this argument But, his paper did not analyze the formulas used in regression analysis

By understanding the formulas and what they describe, we can further understandwhy the R’ statistic is not a reliable measure of a good fit This paper will analyze these formulas important to regression analysis, (1) the basic linear regression model, (2) the Analysis of Variance sum of squares formulas, and (3) the R2 formula in terms of the sum of squares With an understanding of these formulas and what they measure, actuaries can properly use the R2 value to best determine the forecasted trend

Formulas- The Analysis of Variance (ANOVA) approach to regression analysis is based on partitioning the Total Sum of Squares into the Error Sum of Squaresand Regression Sum of Squares

(1) The basic linear regression model is stated as’ Y, = Bo + B, X, where Y, = the observed dependent variable X, = the independent variable in the ith trial Y, = the fitted dependent variable for the independent variable X, Y = mean Y, = C Y, / n

Analvsis of Variance (ANOVA) Annroach to Regression Analysis SST0 = Total Sum of Squares = 1 (Y, - r )’ = Measure of the variation of the observed values around the mean SSE = Error Sum of Squares = C(YI - Y,)’ = Measure of the variation of the observed values around the regression line. SSR = Regression Sum of Squares = 1 (Y,-? ) = Measure of the variation ofthe fitted regression values around the mean = SST0 - SSE = Difference between Total and Error Sum of Squares

Coefficient of Determination, R2 = (SST0 - SSE)/SSTO = SSRISSTO.

The ANOVA formulas have these properties for a regression fit with a slope close to zero

(1) Y, = ?,^ note the values in column Y fitted (fi) are not far from v^ = 49.590.

(2) SSE = SST

The analysis of variance sum of squares are: SST0 = C (Y,-r;)* = 4. SSE = 1 (Y,-Y,)* = 4. SSR = 1 (Y,-?)2 = 0. The variation around the regression line (SSE) is not much better (smaller) than the total variation (SSTO) (3) R2 = (SST0 - SSE)/ SST0 = SSR I SST = (4571-4460)/ 4571 = 0.111/4.571 = 024

Because the SSE is not much less than the SSTO, the R2 value is close to 0. For SSR to be large, there needs to be a lot of variation of the fitted values around the mean So anytime there is not a lot of variation in the data, the R2 = 0 While this meansthat not much additional is explained by the fitted model, the “fit” may reasonably represent the data And projecting with a slope of zero may be an appropriate forecast Of course, you don’t need regression to project a slope of zero, you can just forecast the mean

In Example #2. Barclay adds 0 to the first Y observed, one to the second Y observed, two to the third, etc The line of best fit has Bo = 48.38813, and B, = I .036667 This provides an interesting example for comparing the fit and the numerical values in the ANOVA formulas.

I 48 746 49 425 -0.679 -5 344 -4 665 2 SO 914 50461 0 453 -3 176 -3 62X 3 Sl 246 Sl39Y -0 252 -2 x44 -2 S I 4 I 53.297^ I^ 52 535^0762 1 -0793^ I^ -1.555^ I

I 0 5x OR4 5X 7SS -0671 3 994 4 665 Sum 540 X9X 540 898 0 000 0 000 0. MCill s4 0898 54 090 1 Sumof SquaresI (^) I I (SSE) 4 460 ) (SSTO)93 I21 I (SSR)XX.661 1 I I<‘=^0952 I I

The interesting part of this example is that the residuals (Y, -9, ) are exactly the same as in Example til. So the SSE is the same. Recall that Linear Regression minimizes the sum of the squared residuals. Should the lines in Example # 1 and Example #2 have the same fit?

Let’s look at the ANOVA formulas to see the properties of a “good lit” as measuredby R’ = 1: (1) Y, = Y, ; the fitted values (9, column) are close to the observed (Y, column), a “good lit.” Here we decide that Yi = Y, , in favor of Y, = ?, because there is more variation in the observations from the mean We choose Y, = Y, , even though we have the same values for the residuals as in Example # 1, (2) SSE = 0. The analysis of variance sum of squares are: SSTO=x(Y,-Y)*=93. SSE = 1 (Y, -9,)’ = 4. SSR = 1 (%‘I-r)‘= 88. The variation around the regression line (SSE) is much better (smaller) than the total variation (SSTO). (3) R2 = (SST0^ - SSE ) /SST0 =^ SSR I SST =(93.121-4.460)/93.121 = 88.661/93.121 =.

The SSE is much less than the SST0 So a large proportion of the variation of the actual observations around the mean is being explained by the fitted line. With the SSE close to zero, most of the observations are on the fitted line. However, you will note that this is relative, because w-ehave the same SSE as in Example #I. It is because a large proportion of the SST0 is explained by the fitted line, that we decide there is a good lit.

What does the R* statistic measure? The R* statistic is a useful tool to determine whether or not BI = 0 For in regression, if B, = 0, there is no good reason to use the fitted line. As actuaries, we are often trying to forecast. If the slope is zero (Bi = 0), then we can use the mean to forecast the fitted value.

In fact, the formula for Br can be written as a function of R’:

B,-[~(Y,-Y)‘;‘C/X,-X)‘11’2r, (^) where r ~~* F.K wrth the sign the same as the slope

So when Br=O, then R’=O; and when R’=O, then B,=O

Both Example #I and Example #2 have the same residuals, or SSE. From one perspective, each line has the same fit. The reason for the difference between the R’ values was that in Example #2, the fitted slope is much different from zero and explains proportionally more of the larger variation in the SSTO.

The Usefulness of the R2 Statistic, Lecture notes of Statistics

Related documents

Partial preview of the text

Download The Usefulness of the R2 Statistic and more Lecture notes Statistics in PDF only on Docsity!

The UsqUrzess of the R2 Statistic

by Ross Fonticella, ACAS

(2) SSE = SST