












































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This document introduces the core principles of econometrics, focusing on its application in analyzing economic data. It covers key data types like cross-sectional, time series, and panel data, emphasizing the challenges of establishing causal relationships in social sciences. Real-world examples, such as crime economics and wage estimation, demonstrate the application of econometrics. The document also discusses the limitations of econometrics and the importance of integrating statistical analysis with economic reasoning. It explores the application of econometrics in analyzing student data, including factors influencing education, birth weight, and consumption patterns. The document delves into regression analysis, including the derivation of the OLS estimator, interpretation of regression results, and comparisons between OLS and regression through the origin estimators. It concludes with a discussion on variable transformations and the interpretation of the intercept in linear regre
Typology: Exercises
1 / 84
This page cannot be seen from the preview
Don't miss anything!
Econometrics is the application of statistical and mathematical methods to the analysis of economic data in order to give empirical content to economic relationships. It involves using data to estimate economic relationships, test economic theories, and evaluate and implement public policies.
The main types of economic data discussed in the text are:
Cross-sectional data : Data on one or more variables collected at the same point in time for a sample of economic units, such as individuals, households, firms, cities, states, or countries. Time series data : Data on one or more variables collected over time for a single economic unit. Panel data : Data with both a cross-sectional and time series dimension, such as multiple individuals or firms observed over multiple time periods.
Drawing causal inferences in the social sciences is challenging due to the presence of confounding factors that can lead to spurious correlations. The text discusses several examples to illustrate this point:
Agricultural yield example : The observed positive correlation between fertilizer use and crop yield may be due to other factors, such as weather conditions, that affect both variables. Return to education example : The observed positive correlation between education and earnings may be due to ability or family background factors that affect both education and earnings. Crime example : The observed positive correlation between poverty and crime may be due to other factors, such as lack of economic opportunities, that affect both poverty and crime.
The text emphasizes that experimental data, where the explanatory variables are randomly assigned, is the ideal way to establish causality. However, in many cases, researchers must rely on observational (non- experimental) data, which requires more sophisticated econometric methods to address the issue of confounding factors.
The text provides two motivating examples to illustrate the use of econometrics:
Economics of crime : Examining the relationship between crime rates and various economic and demographic factors. Wage example : Estimating the effect of education, experience, and other factors on individual wages.
These examples demonstrate how econometrics can be used to test economic theories and inform policy decisions.
The text acknowledges that econometrics has limitations, as the social sciences deal with complex human behavior that is difficult to fully capture with statistical models. Econometric analysis must be combined with economic reasoning and an understanding of the institutional details of the problem being studied.
Rational Behavior and Student Choices
Economists assume that students choose a mix of studying, working, attending class, leisure, and sleeping based on rational behavior, such as maximizing utility subject to the constraint of 168 hours in a week. Statistical methods, such as regression analysis, can be used to measure the association between studying and working, but these methods do not imply that one variable "causes" the other, as both are choice variables of the student.
Average Education and Wages : The average years of education in the sample is about 12.6, with 2 people reporting 0 years and 19 people reporting 18 years. The average hourly wage in the sample is about $5.90, which seems low in 2005. To convert the 1976 wage to 2003 dollars, the ratio of the Consumer Price Index (CPI) from 1976 to 2003 is used, resulting in an average hourly wage of approximately $19.06 in 2003 dollars.
The sample contains 252 women and 274 men.
Cigarette Smoking During Pregnancy :
The sample contains 1,388 observations, and 212 women have a positive number of cigarettes smoked (cigs > 0).
The fitted values and residuals are provided, and the R-squared from the regression is about 57.7%, indicating that ACT explains a substantial portion of the variation in GPA.
Regression of Birth Weight on Cigarette Smoking :
The predicted birth weight decreases by about 8.6% when the number of cigarettes smoked increases from 0 to 20. However, there are many other factors that can affect birth weight, and using only cigarette smoking to predict birth weight has limitations.
The large number of women who did not smoke during pregnancy (84.7%) means that the predicted birth weight at 0 cigarettes is roughly in the middle of the observed birth weights for non-smokers, leading to underprediction of high birth rates.
Consumption Function :
The intercept of the consumption function implies that when income (inc) is 0, consumption (cons) is predicted to be negative $124.84, which is not plausible. Plugging in an income of $30,000 results in a predicted consumption of $25,465.16.
The graph shows the Marginal Propensity to Consume (MPC) and the Average Propensity to Consume (APC) over the range of incomes, with the APC being positive even at the lowest income level.
Housing Prices and Distance from an Incinerator :
If living closer to an incinerator depresses housing prices, then being farther away increases housing prices. However, if the city chose to locate the incinerator in an area away from more expensive neighborhoods, then the distance variable [log(dist)] would be positively correlated with housing quality, violating the OLS assumption SLR.4.
Other factors, such as size of the house, number of bathrooms, age of the home, and neighborhood quality, could also be correlated with the distance variable.
Conditional Expectations and Variances :
When conditioning on income (inc) in computing expectations and variances, inc becomes a constant, and the conditional expectation of the error term is 0, and the conditional variance of the error term is the error variance ( 2 e σ ).
The wider variability in saving among higher-income families can be explained by the fact that they have more discretion in their spending decisions compared to low-income families, who must spend on necessities.
Comparing OLS and the Regression through the Origin Estimator :
The bias in the regression through the origin estimator ( 1 β % ) is zero when the intercept ( 0 β ) is 0 or when the sample mean of the explanatory variable (x ) is 0. The variance of 1 β % is less than or equal to the variance of the OLS estimator ( 1 ˆ β ), but the bias in 1 β % increases as the sample mean of the explanatory variable (x ) increases. Whether 1 β % or 1 ˆ β is preferred on a mean squared error basis depends on the sizes of 0 β , x , and the sample size (n), as well as the sum of the squared explanatory variables ( 2 1 n i i x = ∑ ).
Regression with Transformed Variables : When regressing c1yi on c2xi, the slope coefficient is (c1/c2) 1 ˆ β , and the intercept is c1 0 ˆ β , where 1 ˆ β and 0 ˆ β are the OLS estimates from regressing yi on xi. When regressing (c1 + yi) on (c2 + xi), the slope coefficient is 1 ˆ β , and the intercept is 0 ˆ β , as the constants c1 and c2 drop out of the slope formula.
The Intercept in Linear Regression
The intercept in a linear regression model is given by the formula:
$\beta_0 = \bar{y} - \hat{\beta}_1 \bar{x}$
where $\bar{y}$ is the sample mean of the dependent variable, $\bar{x}$ is the sample mean of the independent variable, and $\hat{\beta}_1$ is the estimated slope coefficient.
The intercept $\beta_0$ represents the expected value of the dependent variable when the independent variable is zero, holding all other factors constant. In other words, it is the predicted value of the dependent variable when the independent variable is zero.
If the model is specified in logarithmic form, such that $\log(y_i) = \beta_0 + \beta_1 \log(x_i) + u_i$, then the intercept $\beta_0$ can be interpreted as the expected value of $\log(y)$ when $\log(x) = 0$, or equivalently, when $x = 1$.
estimated slope coefficient $\hat{\beta}_1$ in the multiple regression model $y_i = \beta_0 + \beta_1 x_1i + \beta_2 x_2i + \dots + \beta_k x_ki + u_i$, even if the true slope coefficients are the same.
The multiple regression slope coefficient $\hat{\beta}_1$ represents the change in the dependent variable $y$ associated with a one-unit increase in $x_1$, holding all other independent variables constant.
The Omitted Variable Bias
If an important variable $z$ is omitted from the regression model, the estimated coefficient $\hat{\beta}_1$ will be biased. The bias is given by $ \beta_z \gamma_1$, where $\beta_z$ is the true coefficient on $z$ and $ \gamma_1$ is the coefficient from the regression of $x_1$ on $z$.
If the omitted variable $z$ is correlated with the included independent variable $x_1$, and $z$ also has an effect on the dependent variable $y$, then the estimated coefficient on $x_1$ will be biased because it will capture some of the effect of $z$.
The Gauss-Markov Theorem
The Gauss-Markov Theorem states that under the first four assumptions (random sampling, linearity, zero conditional mean, and homoskedasticity), the OLS estimators are the best linear unbiased estimators (BLUE) of the population parameters.
Multicollinearity
Multicollinearity refers to the situation where the independent variables in a multiple regression model are highly correlated with each other. While multicollinearity does not violate the Gauss-Markov assumptions, it can make it difficult to precisely estimate the individual effects of the highly correlated variables.
Elaboration of the Given Text
(i) The rank of a law school is inversely related to its prestige, and a higher rank (i.e., lower prestige) leads to lower starting salaries for its graduates. For example, a rank of 100 means the school is considered the 99th best.
Both the Law School Admission Test (LSAT) score and Grade Point Average (GPA) of the entering class are measures of the quality of the students. Regardless of where the better students attend law school, they are expected to earn higher salaries on average.
The number of volumes in the law library and the tuition cost are both indicators of the school's quality. Tuition cost reflects the quality of the faculty, physical facilities, and other aspects of the school.
The coefficient on GPA, when multiplied by 100, represents a 24.8% increase in predicted median starting salary.
(iv) A 1% increase in the number of library volumes is associated with a 0.095% increase in the predicted median starting salary, holding other factors constant.
(v) It is better to attend a law school with a lower rank. If law school A has a ranking 20 places higher than law school B, the predicted starting salary for law school A is 6.6% higher.
(i) If we change the "study" variable, we must change at least one of the other variables (sleep, work, or leisure) to maintain the identity that study + sleep + work + leisure = 168 hours per week.
(ii) Since study can be written as a linear function of the other independent variables, the multiple linear regression assumption of MLR.3 (no perfect collinearity) is violated.
(iii) To avoid the issue in part (ii), we can drop one of the independent variables, such as leisure. Then, the interpretation of the coefficients changes - the coefficient on study represents the change in GPA when study increases by one hour, holding sleep and work constant (and implicitly reducing leisure by one hour).
Conditioning on the outcomes of the explanatory variables, the expected value of the OLS estimator 1 β % is equal to the sum of the true coefficients β1 and β2.
Only omitting an important variable that is correlated with the included explanatory variables can cause bias in the OLS estimators. The homoskedasticity assumption (MLR.5) and the degree of collinearity among the explanatory variables do not affect the unbiasedness of the OLS estimators.
Conditional Expectation and Variance of the
OLS Estimator
Conditional on all sample values on x1, x2, and x3, only the last term is random due to its dependence on ui. But E(ui) = 0, and so:
1 3 1 E(β̂) = β1 + β2 + β3 n n n
which is what we wanted to show. Notice that the term multiplying β3 is the regression coefficient from the simple regression of xi3 on r̂i.
The shares, by definition, add to one. If we do not omit one of the shares then the equation would suffer from perfect multicollinearity. The parameters would not have a ceteris paribus interpretation, as it is impossible to change one share while holding all of the other shares fixed.
Because each share is a proportion (and can be at most one, when all other shares are zero), it makes little sense to increase sharep by one unit. If sharep increases by .01 – which is equivalent to a one percentage point increase in the share of property taxes in total revenue – holding shareI, shareS, and the other factors fixed, then growth increases by β1 (.01). With the other shares fixed, the excluded share, shareF, must fall by .01 when sharep increases by .01.
For notational simplicity, define szx = (1/n) Σ(zi - z̄)(xi - x̄); this is not quite the sample covariance between z and x because we do not divide by n – 1, but we are only using it to simplify notation. Then we can write β̂1 as:
1 1 (zi - z̄) β̂ 1 = Σ yi --------- n szx (xi - x̄)
This is clearly a linear function of the yi: take the weights to be wi = (zi - z̄)/ szx. To show unbiasedness, as usual we plug yi = β0 + β1xi + ui into this equation, and simplify:
1 1 (zi - z̄) β̂ 1 = Σ (β0 + β1xi + ui) --------- n szx (xi - x̄) = β0 + β1 + Σ (zi - z̄)ui/szx = β0 + β
where we use the fact that Σ(zi - z̄) = 0 always. Now szx is a function of the zi and xi and the expected value of each ui is zero conditional on all zi and xi in the sample. Therefore, conditional on these values, E(β̂1) = β1.
From the fourth equation in part (i) we have (again conditional on the zi and xi in the sample):
Var(β̂1) = (1/n^2) Σ(zi - z̄)^2 Var(ui)/szx^2 = σ^2/Σ(xi - x̄)^
because of the homoskedasticity assumption [Var(ui) = σ2 for all i].
We know that Var(β̂1) = σ2/Σ(xi - x̄)^2. Now we can rearrange the inequality in the hint, drop x̄ from the sample covariance, and cancel n-1 everywhere, to get Σ(zi - z̄)^2/szx^2 ≥ 1/Σ(xi - x̄)^2. When we multiply through by σ2 we get Var(β̂1) ≥ Var(β̂1), which is what we wanted to show.
Solutions to Computer Exercises
Probably β2 > 0, as more income typically means better nutrition for the mother and better prenatal care.
On the one hand, an increase in income generally increases the consumption of a good, and cigs and faminc could be positively correlated. On the other, family incomes are also higher for families with more education, and more education and cigarette smoking tend to be negatively correlated. The sample correlation between cigs and faminc is about −.173, indicating a negative correlation.
The regressions without and with faminc are:
bwght = 119.77 - .514cigs n = 1,388, R^2 =.
and
bwght = 116.97 - .463cigs + .093faminc n = 1,388, R^2 =.
The effect of cigarette smoking is slightly smaller when faminc is added to the regression, but the difference is not great. This is due to the fact that cigs and faminc are not very correlated, and the coefficient on faminc is practically small. (The variable faminc is measured in thousands, so $10, more in 1988 income increases predicted birth weight by only .93 ounces.)
The estimated equation is:
price = -19.32 + .128sqrft + 15.20bdrms n = 88, R^2 =.
Holding square footage constant, Δprice/Δbdrms = 15.20, and so price increases by $15,200.
Now Δprice = .128Δsqrft + 15.20Δbdrms = .128(140) = $17.92. Because the size of the house is increasing, this is a much larger effect than in (ii).
About 63.2%.
The intercept means that for a student with zero prior GPA and ACT score, the predicted attendance rate is 75.7%, which is not an interesting segment of the population. The coefficient on priGPA means that a one-point increase in prior GPA is associated with a 17.3 percentage point increase in attendance rate, holding ACT fixed.
The negative coefficient on ACT suggests that students with higher potential (higher ACT scores) think they can get by with missing lectures.
Regression of Education on Experience and Tenure :
The regression equation is educ = 13.57 - 0.074exper + 0.048tenure + ε, with n = 526 and R^2 = 0.101.
When regressing log(wage) on the predicted residual ε from the previous regression, the coefficient on ε is identical to the coefficient on educ in the original equation.
Relationship between IQ, Education, and Wages :
The slope coefficient from the regression of IQ on education is 3.53383. The slope coefficient from the regression of log(wage) on education is 0.05984. The slope coefficients from the regression of log(wage) on education and IQ are 0.03912 and 0.00586, respectively.
The combined effect of education and IQ on log(wage) is approximately equal to the coefficient from the regression of log(wage) on education alone.
Regression of Math Pass Rates on Spending and Poverty :
The regression equation is math10 = 20.36 + 6.23log(expend) - 0.305lnchprg, with n = 408 and R^2 = 0.180. The results imply that more spending increases the pass rate (holding poverty fixed) and a higher poverty rate decreases the pass rate (holding spending fixed). The simple regression of math10 on log(expend) has a larger estimated spending effect, but a lower R-squared.
The negative correlation between log(expend) and lnchprg leads to an overestimate of the effect of spending in the simple regression.
Regression of Soda Prices on Proportion Black and Income :
The regression equation is psoda = 0.956 + 0.115prpblck + 0.0000016income, with n = 401 and R^2 = 0.064. If the proportion black increases by 10 percentage points, the price of soda is estimated to increase by about 1.2 cents.
The simple regression estimate of the effect of prpblck is lower than the multiple regression estimate, due to the negative correlation between prpblck and income. The constant elasticity model, log(psoda) = -0.794 + 0.122*prpblck
The text provides various examples and evidence to illustrate the concepts of regression analysis and hypothesis testing, covering topics such as the interpretation of regression coefficients, the impact of multicollinearity, the differences between simple and multiple regression, and the use of alternative model specifications.
Hypothesis Testing and Regression Analysis
(i) The standard error on the variable hrsemp has not changed, but the magnitude of the coefficient has increased by half. The t-statistic on hrsemp has gone from about -1.47 to -2.21, so now the coefficient is statistically less than zero at the 5% level.
(ii) If we add and subtract 2β log(employ) from the right-hand-side and collect terms, we have:
log(scrap) = β0 + β1 hrsemp + θ3 log(sales/employ) + u
where θ3 ≡ β2 + β3.
(iii) The null hypothesis that the size of the firm, as measured by employees, does not matter once we control for training and sales per employee (in a logarithmic functional form) cannot be rejected. The t-statistic on log(employ) is very small (0.2), indicating that the size of the firm does not have a significant effect on the dependent variable.
(iv) The null hypothesis in the model from part (ii) is H0: β2 = -1. The t- statistic is very small (0.132), and we fail to reject the null hypothesis whether we specify a one- or two-sided alternative.
(i) We use Property VAR.3 from Appendix B to compute the standard error of β1 - 3β2.
(iii) We can write β1 = θ1 + 3β2 and plug this into the population model to get:
y = β0 + θ1 x1 + β2 (3x1 + x2) + β3 x3 + u
Elaboration of the Given Text
(i) The sample standard deviation of phsrank is about 24.
(ii) Adding phsrank to the regression makes the t-statistic on jc even smaller in absolute value, around 1.33. However, the coefficient magnitude is similar to the previous estimate of 4.26. Therefore, the base point remains unchanged: the return to a junior college is estimated to be somewhat smaller, but the difference is not significant at standard significance levels.
(iii) The variable id is a worker identification number, which should be randomly assigned (at least roughly). Therefore, id should not be correlated with any variable in the regression equation. In fact, its t-statistic is about 0.54.
(i) There are 2,017 single people in the sample of 9,275.
(ii) The estimated equation for single people is: nettfa = -43.04 + 0. inc + 0.843 age (4.08) (0.060) (0.092) n = 2,017, R^2 = 0.119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa. The coefficient on age means that, holding income fixed, if a person gets another year older, their nettfa is predicted to increase by about $843.
(iii) The intercept of -43.04 is not very interesting, as it gives the predicted nettfa for inc = 0 and age = 0 , which is clearly outside the relevant population.
(iv) The t-statistic for the coefficient on age is approximately -1.71. Against the one-sided alternative H1: β2 < 1, the p-value is about 0.044. Therefore, we can reject H0: β2 = 1 at the 5% significance level.
(v) The slope coefficient on inc in the simple regression is about 0.821, which is not very different from the 0.799 obtained in part (ii). The correlation between inc and age in the sample of single people is only about 0.039, which helps explain why the simple and multiple regression estimates are not very different.
(i) The OLS regression results are: log(psoda) = -1.46 + 0.073 prpblck
(ii) The correlation between log(income) and prppov is about -0.84, indicating a strong degree of multicollinearity. Yet each coefficient is very
statistically significant: the t-statistic for log(income) is about 5.1 and that for prppov is about 2.86 (two-sided p-value = 0.004).
(iii) The OLS regression results when log(hseval) is added are: log(psoda) = -0.84 + 0.098 prpblck - 0.053 log(income) + 0.052 prppov + 0.121 log(hseval) (0.29) (0.029) (0.038) (0.134) (0.018) n = 401, R^2 = 0.184 The coefficient on log(hseval) is an elasticity: a one percent increase in housing value, holding the other variables fixed, increases the predicted price by about 0.12 percent. The two-sided p-value is essentially zero.
(iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15% significance level). Nevertheless, they are jointly significant at the 5% level.
(v) The regression in part (iii) seems the most reliable, as it holds fixed three measures of income and affluence. Therefore, a reasonable estimate is that if the proportion of blacks increases by 0.10, psoda is estimated to increase by 1%, other factors held fixed.
Elaboration of the Given Text
The provided text discusses various aspects of regression analysis and the interpretation of regression results. It covers the following key points:
Obtaining First-Order Conditions : The text shows that if the regression coefficients (β's) are obtained by plugging in the scaled dependent and independent variables into the first-order conditions, then these first-order conditions are satisfied. This proves that the OLS estimates are the unique solutions to the first-order conditions, provided there is no perfect collinearity in the independent variables.
Interpreting Regression Coefficients : The text demonstrates how to interpret the regression coefficients, including the turnaround point and the significance of the coefficients. It also discusses the interpretation of coefficients when the variables are rescaled.
Interaction Effects : The text examines the interpretation of interaction effects, such as the effect of education and parental education on log wages. It highlights the importance of including the level effects when estimating interaction effects to avoid biased results.
Omitted Variable Bias : The text provides an example of how omitting a relevant variable (parental education) can lead to biased estimation of the interaction effect.
Hypothesis Testing : The text discusses the use of F-tests to test the joint significance of additional variables in an extended regression model.
voteA = 18.20 + .157 prtystrA - .0067 (.050) (.0028) (.0026) (. 025), n = 173, R2 = .868, R̄ 2 = .865. Notice how much higher the goodness-of-fit measures are as compared with the equation estimated earlier, and how significant shareA is.
To obtain the partial effect of expendB on v̂oteA, we must compute the partial derivative:
∂v̂oteA/∂expendB = (β3 + β4 ∂shareA/∂expendB) , where shareA = 100[expendA/(expendA + expendB)]. Evaluated at expendA = 300 and expendB = 0 , the partial derivative is -100(300/3002) = −1/3, and therefore ∂v̂oteA/∂expendB ≈ −.164. So v̂oteA falls by .164 percentage points given the first thousand dollars of spending by candidate B, where A's spending is held fixed at 300 (or $300,000). This is a fairly large effect, although it may not be the most typical scenario.
If we hold all variables except priGPA fixed and use the usual approximation ∆(priGPA2) ≈ 2(priGPA)·∆priGPA, then we have:
∆stndfnl/∆priGPA ≈ β2 + 2β4(priGPA) + β6(atndrte).
In equation (6.19), we have β̂2 = -1.63, β̂4 = .296, and β̂6 = .0056. When priGPA = 2.59 and atndrte = .82, we have ∆stndfnl/∆priGPA ≈ -1.63 + 2(.296)(2.59) + .0056(.82) ≈ -.0056.
First, note that (priGPA - 2.59)2 = priGPA2 - 2(2.59)priGPA + (2.59)2 and priGPA(atndrte - .82). So we can write equation (6.18) as:
stndfnl = β0 + β1atndrte + β2(priGPA - 2.59) + β3ACT + β4(priGPA
When we run the regression associated with this last model, we obtain θ̂ [which differs from part (i) by rounding error] and se(θ̂2) ≈ .363. This implies a very small t statistic for θ̂2.
The estimated equation (where price is in dollars) is:
price = -21,770.3 + 2.068 lotsize + 122.78 sqrft + 13,852.5 bdrms (29,475.0) (0.642) (13.24) (9,010.1), n = 88, R2 = .672, R̄ 2 =. 661, σ̂ = 59,.
The predicted price at lotsize = 10,000, sqrft = 2,300, and bdrms = 4 is about $336,714. The regression is pricei on (lotsizei - 10,000), (sqrfti - 2,300), and (bdrmsi - 4). The 95% CI for the intercept is approximately 336,706.7 ± 14,665, or about $322,042 to $351,372 when rounded to the nearest dollar.
We must use equation (6.36) to obtain the standard error of ŷ0 and then use equation (6.37) (assuming that price is normally distributed). But from the regression, se(ŷ0) ≈ 59,833. This gives the 95% CI for price0, at the given values of the explanatory variables, as 336,706.7 ± 1.99(60,285.8) or, rounded to the nearest dollar, $216,738 to $456,675.
The estimated equation is:
points = 35.22 + 2.364 exper - .0770 exper2 - 1.074 age - 1. coll (6.99) (.405) (.0235) (.295) (.451), n = 269, R2 = .141, R̄ 2 = .128.
The turnaround point is 2.364/[2(.0770)] ≈ 15.4 years of experience. This is a very high level of experience, and we can essentially ignore this prediction.
Many of the most promising players leave college early, or, in some cases, forego college altogether, to play in the NBA. These top players command the highest salaries. It is not more college that hurts salary, but less college is indicative of super-star potential.
When age2 is added to the regression, its coefficient is .0536 (se = .0492). Its t statistic is barely above one, so we are justified in dropping it. The coefficient on age in the same regression is -3.984 (se = 2.689). Together, these estimates imply a negative, increasing, return to age. The turning point is roughly at 74 years old.
The OLS results are:
log(wage) = 6.78 + .078 points + .078 exper - .0071 exper2 -. age - .040 coll (.85) (.007) (.050) (.0028) (.035) (.053), n = 269, R2 = .488, R̄ 2 =..
The joint F statistic produced by Stata is about 1.19. Therefore, once scoring and years played are controlled for, there is no evidence for wage differentials depending on age or years played in college.
The estimated equation is:
log(bwght) = 6.89 + .0189 npvis - .00043 npvis2 (.027) (.0037) (. 00012), n = 1,764, R2 = .0213, R̄ 2 =..
The turning point calculation is npvis* = .0189/[2(.00043)] ≈ 21.97, or about 22. In the sample, 89 women had 22 or more prenatal visits.
While prenatal visits are a good thing for helping to prevent low birth weight, a woman's having many prenatal visits is a possible indicator of a pregnancy with difficulties. So it does make sense that the quadratic has a hump shape.
With mage added in quadratic form, we get: