Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Multiple Regression Analysis: Heart Attack Data and Education, Income, and Gender Data - P, Study notes of Political Science

An analysis of multiple regression examples using heart attack data and education, income, and gender data. The heart attack data discusses the relationship between heart attack indicators, age, and treatment. The analysis reveals that the treatment group has a higher risk of heart attacks, and the older the age, the higher the heart attack indicators. The education, income, and gender data analysis shows that women make less than men on average, but the difference is underestimated due to the effect of education level on income. The regression analysis is presented for each dataset.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-3kh-1
koofers-user-3kh-1 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PSC 101
Multiple Regression examples
Heart attack data:
We have three measures of interest here. Our dependent variable “attack” measures the level of
indicators for heart attacks. The higher the values, the greater the risk of heart attack. Half of
our sample gets an experimental drug treatment, while half does not. If we look at the data, we
see that the people receiving the treatment in fact do worse on the measure of interest
(remember, higher values are worse).
. tab treat, summ(attack)
| Summary of Heart attack indicators
treatment | Mean Std. Dev. Freq.
------------+------------------------------------
O | .44812433 .1599955 250
X | .50499594 .14360768 250
------------+------------------------------------
Total | .47656013 .1545146 500
Maybe the drug makes people worse off. There is not random assignment, however, so we can’t
assume that the treatment and control groups are equal in every way prior to receiving the
treatment. One thing in particular to examine is the age of the two groups. When we do that we
see:
. tab treat, summ(age)
| Summary of age
treatment | Mean Std. Dev. Freq.
------------+------------------------------------
O | 40.375356 11.333267 250
X | 54.832344 11.394994 250
------------+------------------------------------
Total | 47.60385 13.462594 500
Aha! The group getting the treatment is much older on average than the group getting no
treatment. Now we have two candidate independent variables—whether a subject received the
treatment, and the subject’s age. What we would like to do is to compare how subjects of similar
ages fare under treatment and control. One way to see that is with a scatter plot. Here, we mark
the points by treatment (X) and no treatment (0).
twoway scatter attack age, msym(point) mlabel(treat)
pf3
pf4
pf5

Partial preview of the text

Download Multiple Regression Analysis: Heart Attack Data and Education, Income, and Gender Data - P and more Study notes Political Science in PDF only on Docsity!

PSC 101

Multiple Regression examples

Heart attack data:

We have three measures of interest here. Our dependent variable “attack” measures the level of

indicators for heart attacks. The higher the values, the greater the risk of heart attack. Half of

our sample gets an experimental drug treatment, while half does not. If we look at the data, we

see that the people receiving the treatment in fact do worse on the measure of interest

(remember, higher values are worse).

. tab treat, summ(attack)

| Summary of Heart attack indicators

treatment | Mean Std. Dev. Freq.

O | .44812433 .1599955 250

X | .50499594 .14360768 250

Total | .47656013 .1545146 500

Maybe the drug makes people worse off. There is not random assignment, however, so we can’t

assume that the treatment and control groups are equal in every way prior to receiving the

treatment. One thing in particular to examine is the age of the two groups. When we do that we

see:

. tab treat, summ(age)

| Summary of age

treatment | Mean Std. Dev. Freq.

O | 40.375356 11.333267 250

X | 54.832344 11.394994 250

Total | 47.60385 13.462594 500

Aha! The group getting the treatment is much older on average than the group getting no

treatment. Now we have two candidate independent variables—whether a subject received the

treatment, and the subject’s age. What we would like to do is to compare how subjects of similar

ages fare under treatment and control. One way to see that is with a scatter plot. Here, we mark

the points by treatment (X) and no treatment (0).

twoway scatter attack age, msym(point) mlabel(treat)

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

Heart attack indicators

age

What should you be looking for if you are trying to see if there is a pattern in the data? If the

treatment has an effect (positive or negative), we should see that for most ranges of the age along

the x-axis, the 0s should tend to have higher values than the Xs (if the drug is beneficial) OR the

Os should tend to have lower values than the Xs (if the drug is detrimental). The former seems

to be the case in the scatter plot above. We can see if the insight generated by the scatter plot

holds up under multiple regression.

. reg attack age treat

Source | SS df MS Number of obs = 500

-------------+------------------------------ F( 2, 497) = 313.

Model | 6.64454921 2 3.3222746 Prob > F = 0.

Residual | 5.26895757 497 .010601524 R-squared = 0.

-------------+------------------------------ Adj R-squared = 0.

Total | 11.9135068 499 .023874763 Root MSE =.

attack | Coef. Std. Err. t P>|t| [95% Conf. Interval]

age | .0098503 .000406 24.26 0.000 .0090526.

treatment | - .0855338 .0109208 - 7.83 0.000 - .1069904 -.

_cons | .0504158 .0176387 2.86 0.004 .0157602.

The insight does appear to hold. Controlling for the effects of the treatment, the heart attack

indicators increase with age (you can see this by looking at the sign of the slope coefficient on

age, which is positive and equal to a little less than .01.) The effect of the treatment, controlling

regression lines is captured by the coefficient on the treatment variable (

B

, estimated to be -

.0855), which drops out when the treatment variable equals zero.

Education, income, and Gender data:

Now, a much briefer treatment of a similar problem. Say we have data on average salaries for

men and women as below, with the female variable equaling 1 for females and 0 for males. This

first summary and the regression show that women are making about $1,500 less than males on

average, and the difference in mean salary is statistically significant at conventional levels.

. tab female, summ(inc)

| Summary of income

female | Mean Std. Dev. Freq.

Total | 31493.342 4236.8316 500

. reg inc fem

Source | SS df MS Number of obs = 500

-------------+------------------------------ F( 1, 498) = 16.

Model | 291408242 1 291408242 Prob > F = 0.

Residual | 8.6660e+09 498 17401630.2 R-squared = 0.

-------------+------------------------------ Adj R-squared = 0.

Total | 8.9574e+09 499 17950741.6 Root MSE = 4171.

income | Coef. Std. Err. t P>|t| [95% Conf. Interval]

female | - 1528.071 373.4115 - 4.09 0.000 - 2261.727 - 794.

_cons | 32287.94 269.2709 119.91 0.000 31758.89 32816.

But in our sample, females average slightly more years of schooling:

. tab female, summ(educ)

| Summary of Years of schooling

female | Mean Std. Dev. Freq.

Total | 12.507824 1.5443323 500

This means that if education level affects income level, in our original regression, we are

underestimating the effect of gender on salaries, since the gender variable will have to “do the

work” of both the education variable and the gender variable. These result suggest that we

underestimate the salary gap by $175. Where does that figure come from? I took the difference

of the two coefficients on the female variable (-1703 and - 1528).

. reg inc edu fem

Source | SS df MS Number of obs = 500

-------------+------------------------------ F( 2, 497) = 38.

Model | 1.1952e+09 2 597618757 Prob > F = 0.

Residual | 7.7622e+09 497 15618073.5 R-squared = 0.

-------------+------------------------------ Adj R-squared = 0.

Total | 8.9574e+09 499 17950741.6 Root MSE = 3952

income | Coef. Std. Err. t P>|t| [95% Conf. Interval]

educa | 873.3289 114.8017 7.61 0.000 647.7724 1098.

female | - 1703.96 354.5129 - 4.81 0.000 - 2400.489 - 1007.

_cons | 21455.96 1446.567 14.83 0.000 18613.82 24298.