














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to use dummy variables to test the null hypothesis that the average hourly earnings for men and women are equal. The text also discusses how to condition on other variables, such as labor market experience, and how to avoid the dummy variable trap when dealing with more than two categories. a lecture note from a University of Wisconsin-Madison Economics course.
What you will learn
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!
Christopher Taber
Department of Economics University of Wisconsin-Madison
April 5, 2010
Lets go back to something we thought about very early on in this course: the difference in average wages between men and women
Suppose you want to test whether men make more money than women
That is you have the following null hypothesis
H 0 : E (W | Male) = E (W | Female)
where W is hourly earnings.
How do you do this?
Solution: Turn it into a number as a Dummy Variable
Define
mi =
1 Person is male 0 Person is female.
Now let’s see if regression analysis can be useful.
We will think of this in a “descriptive” way
Let E (Wi | mi ) = β 0 + β 1 mi.
Why is this useful?
Now notice that
E (Wi | Male) = E (Wi | mi = 1 ) = β 0 + β 1 E (Wi | Female) = E (Wi | mi = 0 ) = β 0
Solving out this means that
β 0 = E (Wi | Female) β 1 = E (Wi | Male) − E (Wi | Female)
But that isn’t all.
We might be worried that women have less labor market experience than men.
An interesting null hypothesis might be
H 0 : E (W | Male,Experience) = E (W | Female,Experience)
That is, comparing men and women with the same level of experience, do they earn the same amount of money?
This is easy to do, we just write the model as
E (Wi | mi ) = β 0 + β 1 mi + β 2 Expi + β 3 Exp^2 i.
Then testing whether β 1 = 0 tests exactly what we want.
Why stop there?
We can condition on whatever we want
E (Wi | mi ) = β 0 + β 1 mi + β 2 X (^) i 2 + ... + βK XiK.
Lets look at some examples.
One could think of just running separate regressions for men and for women
E (W | Male,Experience) = βm 0 + β 1 m Expi E (W | Female,Experience) = βf 0 + β 1 fExpi
Lets see what this looks like
But what if I want to test whether these things are the same?
That is I might want to run a joint test of whether men and women face the same earnings profile
The key here is an interaction.
Think about the model
E (Wi | Gender,Experience) = β 0 +β 1 mi +β 2 Expi +β 3 (mi × Expi )
Then
E (W | Male,Experience) = β 0 m + βm 1 Expi = E (Wi | mi = 1 , Expi ) = β 0 + β 1 + β 2 Expi + β 3 Expi E (W | Female,Experience) = β 0 f + βf 1 Expi = E (Wi | mi = 0 , Expi ) = β 0 + β 2 Expi
Think about the following exercise
What would happen if we constructed the new dummy variable
fi =
1 Person i is female 0 Person i is male
What if we then tried to run a regression based on
E (Wi | Gender ) = β 0 + β 1 mi + β 2 fi?
It turns out that this will not work.
We have perfect multicollinearity because
mi = 1 − fi
This is called the Dummy Variable Trap
This makes sense if you think about it.
There are really only two pieces of information in the population
E (Wi | Male) = β 0 + β 1 E (Wi | Female) = β 0 + β 2
We have 3 parameters and 2 equations
Clearly the model is not identified so it makes sense that you would have problems
Stata is smart about this kind of thing though
Let B, Ai , Hi , Ni be dummy variables for black, asian, hispanic, and native american respectively.
That is for example
Bi =
1 Person i is African American 0 otherwise
Then we can think of the regression
E (Wi | Race) = β 0 + β 1 Bi + β 2 Ai + β 3 Hi + β 4 Ni
Note that we have 5 basic population equations (for the 5 races) and 5 parameters so we seemed to have solved the dummy variable trap problem.
How do we interpret the parameters?
E (Wi | African American) = β 0 + β 1 E (Wi | Asian) = β 0 + β 2 E (Wi | Hispanic) = β 0 + β 3 E (Wi | Native American) = β 0 + β 4 E (Wi | All Others) = β 0
Thus solving out one can show that
β 0 = E (Wi | All Others) β 1 = E (Wi | African American) − E (Wi | All Others) β 2 = E (Wi | Asian) − E (Wi | All Others) β 3 = E (Wi | Hispanic) − E (Wi | All Others) β 4 = E (Wi | Native American) − E (Wi | All Others)
Thus the left out group matters a lot in the interpretation of the parameters
Now what about more than one categorical variable at a time.
For example what about race and gender?
Lets just focus on the african american gap by putting all other groups together.
What would happen if we just thought about the model as:
E (Wi | Race,Gender) = β 0 + β 1 mi + β 2 Bi
Note that in this model
E (Wi | White Male) = β 0 + β 1 E (Wi | White Female) = β 0 E (Wi | Black Male) = β 0 + β 1 + β 2 E (Wi | Black Female) = β 0 + β 2
Now actually we have 4 equations and three parameters so we can’t solve out exactly.