






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to find a linear model for given data, make predictions using the model, and measure how closely the model fits the data. The example uses population data for spalding county, georgia from 1960 to 2000 to illustrate the concepts. The document also covers the sum of squares of errors (sse) and average error as measures of fit.
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!
Supplement to Unit 9B MATH 1001 In the handout we will learn how to find a linear model for data that is given and use it to make predictions. We will also learn how to measure how closely the model “fits” the given data. We will learn how to find the linear model that best fits a set of given data. Finding a Linear Model for Data and Making Predictions First, we consider the following table that gives the population for Spalding County, Georgia from 1960 to 2000. (Source: U.S. Census Bureau) Year Pop. (thousand) Change 1960 35. 1970 39.5 4. 1980 47.9 8. 1990 54.5 6. 2000 58.4 3. The third column of this table shows (for each decade year) the change in population during the preceding decade. We see the population of Spalding County increased by about 4 thousand people in the 1950s and 1990s. In the 1970s and 1980s the population increased by roughly 7 to 8 thousand people. We might wonder whether this qualifies as almost linear population growth. So, we will plot the data and look. We plot the year on the x -axis and the population on the y - axis. Figure 1 It appears from Figure 1 that these points do appear to lie on or near some straight line. But, how can we find a straight line that passes through or near each data point? One way is to simply pass a straight line through the first and the last data points. To make this easier, we will let t be the number of years after 1960. Thus, our data will look like:
Fitting Linear Models to Data t (years since 1960)
(in thousands) 0 35. 10 39. 20 47. 30 54. 40 58. To find the slope between the first data point (0, 35.4) and the last data point (40, 58.4), we use the formula for the slope:
Fitting Linear Models to Data Definition: The phrase “ Sum of Squares of Errors ” is so common in data modeling that it is abbreviated SSE. Thus, the SSE associated with data modeling based on n data points is defined by 2 2 3 2 2 2 SSE E 1 E E En To find the SSE we first begin by finding the squares of the errors. t P (Actual) P ( t ) (Predicted) Error, Ei P − P ( t ) 2 E i 0 35.4 35.4 0 0 10 39.5 41.15 −1.65 2. 20 47.9 46.9 1 1 30 54.5 52.65 1.85 3. 40 58.4 58.4 0 0 Thus, the SSE is
Fitting Linear Models to Data Example: (a) Find a linear model for the population of Spalding County using the first and fourth data points; that is, (0, 35.4) and (30, 54,5). (b) Use your model to predict the Spalding County population in the years 1995 and 2010. (c) Find the SSE and average error. Use these to determine whether the model found in this example or the previous model is a better fit for the data. (a) (b)
Fitting Linear Models to Data Example: (a) Find the best-fit linear model for the population data for Spalding County. (b) Use your model to predict the population in 1995 and 2010. (c) Find the SSE and the average error for the model. (a) P ( t ) = 0.61 t + 34. (b) P (35) = 0.61×35 + 34.94 ≈ 56.3 thousand P (50) = 0.61×50 + 34.94 ≈ 65.4 thousand The population of Spalding County was about 56.3 thousand in 1995 and will be about 65. thousand in 2010. (c) t P (Actual) P ( t ) (Predicted) Error, Ei P − P ( t ) 2 E i 0 35.4 34.94 0.46 0. 10 39.5 41.04 −1.54 2. 20 47.9 47.14 0.76 0. 30 54.5 53.24 1.26 1. 40 58.4 59.34 −0.94 0. SSE = 0.2116 + 2.3716 + 0.5776 + 105876 + 0.8836 = 5. average error = 1. 061 5
632 Exercises: In each of problems 1 and 2 the population census data for a U.S. city is given. (a) Find a linear model for the data using the first and last data points. Let t = 0 in the year
Use it to predict the population in 2000. Calculate the average error of the model. (b) Find the linear model that best fits this census data. Let t be 0 in the year 1950. Use it to predict the population in 2000. Calculate the average error of the model.
San Diego, California: Year 1950 1960 1970 1980 1990 Pop. (thous) 334 573 697 876 1111
Riverside, California: Year 1950 1960 1970 1980 1990 Pop. (thous) 47 84 140 171 227 In each of problems 3 and 4 the population census data for a U.S. city is given.
Fitting Linear Models to Data (a) Find a linear model for the data using the second and fourth data points. Let t = 0 in the year 1950. Use it to predict the population in 2000. Calculate the average error of the model. (b) Find the linear model that best fits this census data. Let t be 0 in the year 1950. Use it to predict the population in 2000. Calculate the average error of the model.
Source: The World Almanac and Book of Facts 1998. (a) Find the linear model S ( t ) = mt + b that best fits this data. Let t = 0 in 1988. (b) Compare the model’s prediction for the year 1995 with the actual 1995 CD sales of 722.9 million. (c) Use the model to predict the CD sales for the year 2002. (d) Which prediction, the one for 1995 or the one from 2002, is likely to be closer to actual sales? Why?
Source: Statistical Abstracts of the United States. (a) Find the best-fit linear model for the data. Let t = 0 in 1940. (b) Use your model to predict the number of passenger cars in the year 2000 and in the year 2010.
Fitting Linear Models to Data Answers: