Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Using Dummy Variables to Analyze the Difference in Average Wages Between Men and Women, Study notes of Economic Analysis

How to use dummy variables to test the null hypothesis that the average hourly earnings for men and women are equal. The text also discusses how to condition on other variables, such as labor market experience, and how to avoid the dummy variable trap when dealing with more than two categories. a lecture note from a University of Wisconsin-Madison Economics course.

What you will learn

  • What is a dummy variable and how is it used in regression analysis?
  • How can you test the null hypothesis that the average hourly earnings for men and women are equal?
  • How can you test whether men and women with the same level of experience earn the same amount of money?
  • What is the dummy variable trap and how can it be avoided?

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

ronyx
ronyx 🇬🇧

4

(4)

213 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dummy Explanatory Variables
Christopher Taber
Department of Economics
University of Wisconsin-Madison
April 5, 2010
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Using Dummy Variables to Analyze the Difference in Average Wages Between Men and Women and more Study notes Economic Analysis in PDF only on Docsity!

Dummy Explanatory Variables

Christopher Taber

Department of Economics University of Wisconsin-Madison

April 5, 2010

Categorical Variables

Lets go back to something we thought about very early on in this course: the difference in average wages between men and women

Suppose you want to test whether men make more money than women

That is you have the following null hypothesis

H 0 : E (W | Male) = E (W | Female)

where W is hourly earnings.

How do you do this?

Solution: Turn it into a number as a Dummy Variable

Define

mi =

1 Person is male 0 Person is female.

Now let’s see if regression analysis can be useful.

We will think of this in a “descriptive” way

Let E (Wi | mi ) = β 0 + β 1 mi.

Why is this useful?

Now notice that

E (Wi | Male) = E (Wi | mi = 1 ) = β 0 + β 1 E (Wi | Female) = E (Wi | mi = 0 ) = β 0

Solving out this means that

β 0 = E (Wi | Female) β 1 = E (Wi | Male) − E (Wi | Female)

Adding Conditioning Variables

But that isn’t all.

We might be worried that women have less labor market experience than men.

An interesting null hypothesis might be

H 0 : E (W | Male,Experience) = E (W | Female,Experience)

That is, comparing men and women with the same level of experience, do they earn the same amount of money?

This is easy to do, we just write the model as

E (Wi | mi ) = β 0 + β 1 mi + β 2 Expi + β 3 Exp^2 i.

Then testing whether β 1 = 0 tests exactly what we want.

Why stop there?

We can condition on whatever we want

E (Wi | mi ) = β 0 + β 1 mi + β 2 X (^) i 2 + ... + βK XiK.

Lets look at some examples.

One could think of just running separate regressions for men and for women

E (W | Male,Experience) = βm 0 + β 1 m Expi E (W | Female,Experience) = βf 0 + β 1 fExpi

Lets see what this looks like

But what if I want to test whether these things are the same?

That is I might want to run a joint test of whether men and women face the same earnings profile

The key here is an interaction.

Think about the model

E (Wi | Gender,Experience) = β 0 +β 1 mi +β 2 Expi +β 3 (mi × Expi )

Then

E (W | Male,Experience) = β 0 m + βm 1 Expi = E (Wi | mi = 1 , Expi ) = β 0 + β 1 + β 2 Expi + β 3 Expi E (W | Female,Experience) = β 0 f + βf 1 Expi = E (Wi | mi = 0 , Expi ) = β 0 + β 2 Expi

The Dummy Variable Trap

Think about the following exercise

What would happen if we constructed the new dummy variable

fi =

1 Person i is female 0 Person i is male

What if we then tried to run a regression based on

E (Wi | Gender ) = β 0 + β 1 mi + β 2 fi?

It turns out that this will not work.

We have perfect multicollinearity because

mi = 1 − fi

This is called the Dummy Variable Trap

This makes sense if you think about it.

There are really only two pieces of information in the population

E (Wi | Male) = β 0 + β 1 E (Wi | Female) = β 0 + β 2

We have 3 parameters and 2 equations

Clearly the model is not identified so it makes sense that you would have problems

Stata is smart about this kind of thing though

Let B, Ai , Hi , Ni be dummy variables for black, asian, hispanic, and native american respectively.

That is for example

Bi =

1 Person i is African American 0 otherwise

Then we can think of the regression

E (Wi | Race) = β 0 + β 1 Bi + β 2 Ai + β 3 Hi + β 4 Ni

Note that we have 5 basic population equations (for the 5 races) and 5 parameters so we seemed to have solved the dummy variable trap problem.

How do we interpret the parameters?

E (Wi | African American) = β 0 + β 1 E (Wi | Asian) = β 0 + β 2 E (Wi | Hispanic) = β 0 + β 3 E (Wi | Native American) = β 0 + β 4 E (Wi | All Others) = β 0

Thus solving out one can show that

β 0 = E (Wi | All Others) β 1 = E (Wi | African American) − E (Wi | All Others) β 2 = E (Wi | Asian) − E (Wi | All Others) β 3 = E (Wi | Hispanic) − E (Wi | All Others) β 4 = E (Wi | Native American) − E (Wi | All Others)

Thus the left out group matters a lot in the interpretation of the parameters

Multiple Categorical Variables

Now what about more than one categorical variable at a time.

For example what about race and gender?

Lets just focus on the african american gap by putting all other groups together.

What would happen if we just thought about the model as:

E (Wi | Race,Gender) = β 0 + β 1 mi + β 2 Bi

Note that in this model

E (Wi | White Male) = β 0 + β 1 E (Wi | White Female) = β 0 E (Wi | Black Male) = β 0 + β 1 + β 2 E (Wi | Black Female) = β 0 + β 2

Now actually we have 4 equations and three parameters so we can’t solve out exactly.