

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of dummy variables in regression analysis, their interpretation, and how to use them as regressors or the dependent variable. It also covers the chow test and its significance in testing the null hypothesis of no difference between groups. The document also discusses the difference between numerical and ordinal variables and how to transform them into dummy variables for regression analysis.
What you will learn
Typology: Study notes
1 / 2
This page cannot be seen from the preview
Don't miss anything!
Chapter 7, Dummy Variable Dummy variable can only take values 1 and 0. It is categorical, that means the numbers 1 and 0 have no numerical meanings (we cannot say 1 is greater than 0). In this chapter we use dummy as regressor. Chapter 17 (covered in eco411) shows how to use dummy as the dependent variable. First let’s use wage data and consider a simple regression 𝑤𝑎𝑔𝑒 = 𝛽 0 + 𝛽 1 𝐷 + 𝑢 (1) where 𝐷 = 0 for male, and 𝐷 = 1 for female. For dummy variable, you have to be clear 𝐷 = 0 is for which group (called base group). Later all comparisons are made relative to the base group. You can report the frequency of 𝐷 using tab D. The key to understand the dummy-variable-model is to discuss: when = 0, 𝑤𝑎𝑔𝑒 = _____________________. If we take expectation we get ___________________ when = 1, 𝑤𝑎𝑔𝑒 = _____________________. If we take expectation we get ___________________ So 𝛽 0 can be interpreted as ________________________; and 𝛽 1 can be interpreted as____________________ This result suggests that we can conduct the two-sample t test (the comparison of means test, stata command: ttest wage, by(D)) using the simple regression (1) that involves dummy. Now consider a multiple regression 𝑤𝑎𝑔𝑒 = 𝛽 0 + 𝛽 1 𝐷 + 𝛽 2 𝑥 + 𝛽 3 (𝐷 ∗ 𝑥) + 𝑢 (2) For example x can be exper, and 𝐷 ∗ 𝑥 is the interaction term (product of) x and dummy. Let’s discuss again: when 𝐷 = 0, _________________________________________________________________________________________ when 𝐷 = 1, _________________________________________________________________________________________ 𝛽 0 can be interpreted as _________________________________________________________; 𝛽 1 can be interpreted as _________________________________________________________; 𝛽 2 can be interpreted as _________________________________________________________; 𝛽 3 can be interpreted as _________________________________________________________; How to show 𝛽 1 and 𝛽 3 in graph?
In this context there is a very important F test, called Chow Test, which is concerned with a particular null hypothesis 𝐻0: 𝛽 1 = 0, 𝛽 3 = 0 (3) The meaning of this null hypothesis is _____________________________________ The restricted regression is ______________________________________________ The Chow test is ______________________________________________________ What should we do if the null hypothesis is not rejected? What should we do if the null hypothesis is rejected? When the x variable itself is a dummy, then the regression becomes very interesting. In this case, the 𝛽 3 in the below regression is called difference-in-difference estimator (the coefficient of interaction term of two dummy). 𝑌 = 𝛽 0 + 𝛽 1 𝐷 1 + 𝛽 2 𝐷 2 + 𝛽 3 (𝐷 1 ∗ 𝐷 2 ) + 𝑢 (4) Exercise: How to interpret 𝛽 3? To fix idea, let y be wage, 𝐷 1 be female dummy, and 𝐷 2 be married dummy. Let’s discuss:
Some variables look like numerical, but they are not. Two examples: 𝑥1 = 1 𝑖𝑓 𝑡𝑎𝑘𝑖𝑛𝑔 𝑏𝑢𝑠; 2 𝑖𝑓 𝑡𝑎𝑘𝑖𝑛𝑔 𝑠𝑢𝑏𝑤𝑎𝑦; 3 𝑖𝑓 𝑑𝑟𝑖𝑣𝑖𝑛𝑔 𝑥2 = 1 𝑖𝑓 𝑓𝑎𝑖𝑙𝑖𝑛𝑔 𝑒𝑥𝑝𝑒𝑐𝑎𝑡𝑖𝑜𝑛; 2 𝑖𝑓 𝑚𝑒𝑒𝑡𝑖𝑛𝑔 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛; 3 𝑖𝑓 𝑒𝑥𝑐𝑒𝑒𝑑𝑖𝑛𝑔 𝑒𝑥𝑝𝑒𝑐𝑎𝑡𝑖𝑜𝑛 The values of 𝑥1 have no numerical meaning or ordering; The values of 𝑥2 have no numerical meaning but have ordering; 𝑥2 is called ordinal variable. We do not believe the effect on Y when 𝑥1 changes from 1 to 2 is the same as when x1 changes from 2 to 3. So You cannot use 𝑥1 and 𝑥2 directly as regressor since they are not numerical Instead, we need to transform 𝑥1 and 𝑥2 into a set of dummy variables, and use those dummy variables as regressors. For instance, for 𝑥1 we may define two dummy variables 𝐷1 = 1 𝑖𝑓 𝑡𝑎𝑘𝑖𝑛𝑔 𝑏𝑢𝑠 (𝑜𝑟 𝑥1 = 1) ; 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝐷2 = 1 𝑖𝑓 𝑡𝑎𝑘𝑖𝑛𝑔 𝑠𝑢𝑏𝑤𝑎𝑦 (𝑜𝑟 𝑥1 = 2); 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 We do not need to define a dummy for driving (the last group). The intercept term can represent it (the base group). We will fall into dummy variable trap if we define three dummies for three groups, and use them all along with the intercept term. The dummy variable trap is caused by perfect multicollinearity. The stata will automatically drop one of the dummy for you.