



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Trend Factors, The Meaning of R-Squared Through simple graphical examples, Barclay showed that the coeffkient of variation (R') is, by itself, ...
Typology: Lecture notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!
The Usefulnessof the R’ Statistic
Introduction, Almost every Actuarial Department uses least square regression to tit frequency, severity, or pure premium data to determine loss trends Many actuaries use the R2 statistic to measurethe goodness-of-fit of the trend. Actually, the R’ statistic measureshow significantly the slope of the fitted line differs from zero, which is not the same as a good fit
In the Fall, 1991 Casualty Actuarial Society Forum, D Lee Barclay wrote A Statistical Note On Trend Factors, The Meaning of R-Squared Through simple graphical examples, Barclay showed that the coeffkient of variation (R’) is, by itself, a poor measure of goodness-of-fit. Barclay’s numerical examples provide additional support for this argument But, his paper did not analyze the formulas used in regression analysis
By understanding the formulas and what they describe, we can further understandwhy the R’ statistic is not a reliable measure of a good fit This paper will analyze these formulas important to regression analysis, (1) the basic linear regression model, (2) the Analysis of Variance sum of squares formulas, and (3) the R2 formula in terms of the sum of squares With an understanding of these formulas and what they measure, actuaries can properly use the R2 value to best determine the forecasted trend
Formulas- The Analysis of Variance (ANOVA) approach to regression analysis is based on partitioning the Total Sum of Squares into the Error Sum of Squaresand Regression Sum of Squares
(1) The basic linear regression model is stated as’ Y, = Bo + B, X, where Y, = the observed dependent variable X, = the independent variable in the ith trial Y, = the fitted dependent variable for the independent variable X, Y = mean Y, = C Y, / n
Analvsis of Variance (ANOVA) Annroach to Regression Analysis SST0 = Total Sum of Squares = 1 (Y, - r )’ = Measure of the variation of the observed values around the mean SSE = Error Sum of Squares = C(YI - Y,)’ = Measure of the variation of the observed values around the regression line. SSR = Regression Sum of Squares = 1 (Y,-? ) = Measure of the variation ofthe fitted regression values around the mean = SST0 - SSE = Difference between Total and Error Sum of Squares
Coefficient of Determination, R2 = (SST0 - SSE)/SSTO = SSRISSTO.
The ANOVA formulas have these properties for a regression fit with a slope close to zero
(1) Y, = ?,^ note the values in column Y fitted (fi) are not far from v^ = 49.590.
The analysis of variance sum of squares are: SST0 = C (Y,-r;)* = 4. SSE = 1 (Y,-Y,)* = 4. SSR = 1 (Y,-?)2 = 0. The variation around the regression line (SSE) is not much better (smaller) than the total variation (SSTO) (3) R2 = (SST0 - SSE)/ SST0 = SSR I SST = (4571-4460)/ 4571 = 0.111/4.571 = 024
Because the SSE is not much less than the SSTO, the R2 value is close to 0. For SSR to be large, there needs to be a lot of variation of the fitted values around the mean So anytime there is not a lot of variation in the data, the R2 = 0 While this meansthat not much additional is explained by the fitted model, the “fit” may reasonably represent the data And projecting with a slope of zero may be an appropriate forecast Of course, you don’t need regression to project a slope of zero, you can just forecast the mean
In Example #2. Barclay adds 0 to the first Y observed, one to the second Y observed, two to the third, etc The line of best fit has Bo = 48.38813, and B, = I .036667 This provides an interesting example for comparing the fit and the numerical values in the ANOVA formulas.
I 48 746 49 425 -0.679 -5 344 -4 665 2 SO 914 50461 0 453 -3 176 -3 62X 3 Sl 246 Sl39Y -0 252 -2 x44 -2 S I 4 I 53.297^ I^ 52 535^0762 1 -0793^ I^ -1.555^ I
I 0 5x OR4 5X 7SS -0671 3 994 4 665 Sum 540 X9X 540 898 0 000 0 000 0. MCill s4 0898 54 090 1 Sumof SquaresI (^) I I (SSE) 4 460 ) (SSTO)93 I21 I (SSR)XX.661 1 I I<‘=^0952 I I
The interesting part of this example is that the residuals (Y, -9, ) are exactly the same as in Example til. So the SSE is the same. Recall that Linear Regression minimizes the sum of the squared residuals. Should the lines in Example # 1 and Example #2 have the same fit?
Let’s look at the ANOVA formulas to see the properties of a “good lit” as measuredby R’ = 1: (1) Y, = Y, ; the fitted values (9, column) are close to the observed (Y, column), a “good lit.” Here we decide that Yi = Y, , in favor of Y, = ?, because there is more variation in the observations from the mean We choose Y, = Y, , even though we have the same values for the residuals as in Example # 1, (2) SSE = 0. The analysis of variance sum of squares are: SSTO=x(Y,-Y)*=93. SSE = 1 (Y, -9,)’ = 4. SSR = 1 (%‘I-r)‘= 88. The variation around the regression line (SSE) is much better (smaller) than the total variation (SSTO). (3) R2 = (SST0^ - SSE ) /SST0 =^ SSR I SST =(93.121-4.460)/93.121 = 88.661/93.121 =.
The SSE is much less than the SST0 So a large proportion of the variation of the actual observations around the mean is being explained by the fitted line. With the SSE close to zero, most of the observations are on the fitted line. However, you will note that this is relative, because w-ehave the same SSE as in Example #I. It is because a large proportion of the SST0 is explained by the fitted line, that we decide there is a good lit.
What does the R* statistic measure? The R* statistic is a useful tool to determine whether or not BI = 0 For in regression, if B, = 0, there is no good reason to use the fitted line. As actuaries, we are often trying to forecast. If the slope is zero (Bi = 0), then we can use the mean to forecast the fitted value.
In fact, the formula for Br can be written as a function of R’:
B,-[~(Y,-Y)‘;‘C/X,-X)‘11’2r, (^) where r ~~* F.K wrth the sign the same as the slope
So when Br=O, then R’=O; and when R’=O, then B,=O
Both Example #I and Example #2 have the same residuals, or SSE. From one perspective, each line has the same fit. The reason for the difference between the R’ values was that in Example #2, the fitted slope is much different from zero and explains proportionally more of the larger variation in the SSTO.