Unformatted text preview:

Homework 5 100 points The following two problems will require a lot of calculations in STATA It will generate many pages of output Here is how your should organize it The first pages should contain your answers to all the questions along with showing any key algebraic equations or explanations you need to use along the way After that include a printout of the output from the regressions you executed in support of your answers Highlight any numbers in this output that you used in the first section You are encouraged to save paper here you may print this section with a small font double sided and or with 2 up format Last include a copy of the DO file that contains the commands you asked STATA to execute Be sure you organize these in a way that will be clear to the reader Problem 1 50 points total In the dataset Smoker there is information on 1196 males from the United States Data from this sample includes the variables smoke 1 for smokers and 0 for nonsmokers age age in years educ number of years of schooling income family income pcigs price of cigarettes in the individual s state Part 1 a Generate a dummy variable hi ed that is a 1 if a person has 16 or more years of education b 5 Points Estimate a linear regression which in this context is called a linear probability model LPM for the binary variable smoke on the independent variable hi ed Report the beta coefficient on the dummy variable and its p value In words express what the beta coefficient means in this case 225 p 000 Moving from the hi ed 0 group to hi ed 1 that is moving from low ed to high ed lowers the probability of smoking by 225 c Create a frequency table for the smoke and hi ed variables The command in STATA is tabulate smoke hi ed d 5 Points Calculate the probability that a person smokes if high education Calculate the probability for smoking for low edcuation There are 103 high ed people and 18 of them smoke that is a probability of 1748 There are 1093 low ed people and 437 of them smoke That is a probability of 3998 e 5 Points What is the relationship between the results for parts b and d The difference in the probabilities for part d is 225 This is precisely the value of the beta coefficient from part b Also the value of the constant term in the regression is 3998 which is precisely the probability that a low ed person smokes f 5 Points Calculate the odds that a low education person smokes Calculate the odds a high education person smokes Calculate the odds ratio Odds low ed smokes 3998 1 3998 6662 Odds hi ed smokes 1748 1 1748 2118 Odds ratio An odds ratio always puts in the numerator the value of the odds when the variable increases by 1 unit So put the hi ed odds in the numerator since the hi ed variable has increased from zero to 1 2118 6662 3179 g 5 Points Estimate the logistic regression logistic command for smoke and hi ed Confirm that this equals the odds ratio In words express what the odds ratio coefficient means in this case The logistic output confirms the odds ratio calculation The odds of smoking for a hi ed person is 3179 times the odds of smoking for a low ed person a Estimate the logistic regression for smoke on pcigs when hi ed 0 Then calculate the predicted values for Y The command to do this is predict yhat 0 Repeat for when hi ed 1 and also create a predicted value variable yhat 1 b 5 Points Compare the coefficients for the two models In words explain what the model is saying about the impact of lpcigs for the two different education groups The odds ratio for low ed is less than one Meaning an increase in price reduces the odds of smoking This is a typical result higher prices lower quantity demanded For hi ed we see the opposite as the odds ratio is above 1 As cigarettes become more expensive they smoke more This is not a statistically significant finding for the hi ed group c Create a graph of the predicted values for the two versions Use the following syntax twoway line yhat 0 lpcigs sort line yhat 1 lpcigs sort legend label 1 low ed label 2 hi ed d 5 Points For a low education person a 1 unit change of the price of cigarettes will change the odds of smoking by how much The odds of smoking after a 1 unit price increase is 9748 times the odds before the price change 5 Points Run a logit regression stata command logit for smoke on all the independent variables age educ pcigs income Report the coefficients and p values This Model 1 Remember for logit rather than logistic the coefficients have not been exponentiated Part 2 Part 3 a pcigs 022 p value 074 educ 091 p value 0 00 age 021 p value 0 00 income 4 72 10 6 p value 51 b 5 Points Create an interaction variable of educ and income Run another logit regression adding this interaction to Model 1 Report the coefficients and p values This is Model 2 pcigs 021 p value 083 educ 0391 p value 353 age 018 p value 0 00 income 0095 p value 0 00 ed inc 7 45 10 6 p value 0 00 c 5 Points What differences strike you about Model 1 and Model 2 In particular note how the significance levels of the variables of educ and income have changed now that the interaction is included In a clearly articulated paragraph or two give a thoughtful answer as to what you think must be going on This is not easy Take your time and think hard about it Your answer should contain two parts First talk about what the coefficients of the Model 2 regression are implying Second try and come up with an intuitive economic hypothesis for why we are observing these results I don t expect anyone to be remotely as thorough as this To get full marks you need the equivalent of the underlined sections Graders give 3 points if it is clear that the person has spent the time thinking hard about what is going on In model 1 education on smoking was negative and highly significant The effect of income on smoking was barely positive and not significant When we add in the education income interaction in model 2 the effect of education suddenly switches to positive and not significant and the effect of income becomes larger and highly significant Additionally the interaction term is negative and significant Here s a summary of these results Dep Smoking Probability Model 1 Model 2 educ inc ed inc negative zero zero NA positive negative zero means the variable is not statistically significant at a conventional level What could be going on to drive such a result An interaction …


View Full Document

PSU ECON 306 - Homework 5

Download Homework 5
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework 5 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 5 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?