Dec. 8, 2005 ECON 240A-1 L. PhillipsFinalAnswer all five questions. They are weighted equally.1. (30) We examined the lottery data file xr-18 in labs 6 and 7. Exploratory data analysisis an important tool in illustrating the “zeros problem” and thereby avoiding the econometric problem of bias in estimating a multivariate regression of percent of household income spent on the lottery as explained by income, education, age, and number of children, the incorrect methodology that was suggested in the text in problem 18.13. The stem and leaf diagram of percent of household income spent on the lottery is shown in Figure 1.1 and the histogram is illustrated in Figure 1-2.Figure 1-1: Stem & Leaf Display: Percent of Household Income Spent On LotteryStemsLeaves0->00000000000000000000000111223333334555555555666666777777777777777788888888888888999991->000000001111123a. How many zeros are there? 23.Dec. 8, 2005 ECON 240A-2 L. PhillipsFinalb. Is the distribution unimodal? No, there is a mode at zero and a second mode at 7, i.e. there are probably two peaks.The box plot is shown in Figure 1-3.Figure 1-3: Box Plot of Percent of Household Income Spent On lotteryLotterySmallest = 0Q1 = 1Median = 6.5Q3 = 8Largest = 13IQR = 7Outliers:c. Is the median in the middle of the inter-quartile range? No, median is 6.5 and Q1=1, and Q3=8, so the median is much closer to Q3.d. One fourth of the observations lie below what value? Below or equal to Q1=1. From the stem and lea diagram, the smallest 25 observations consist of 23 zeros and 2 ones.e. What methodologies alternative to multivariate regression might you suggest? You could convert the dependent variable to zeros and ones and estimate a probability model (linear, logit, probit) , estimate a Tobit, or estimate a count model.2. ((30) The data for percent of household income spent on the lottery and years of educational attainment, sorted by the latter, is exhibited in Table 2-1.Table 2-1Dec. 8, 2005 ECON 240A-3 L. PhillipsFinalLottery EducationDec. 8, 2005 ECON 240A-4 L. PhillipsFinal0 75 79 73 86 87 810 813 87 97 97 98 98 910 910 910 910 911 911 912 90 106 106 107 107 107 107 108 109 1010 1010 100 110 113 113 117 118 118 118 118 118 119 1110 1111 110 125 127 128 12Dec. 8, 2005 ECON 240A-5 L. PhillipsFinal8 128 129 1211 1211 120 136 132 144 145 145 147 147 147 147 148 148 149 145 155 155 156 157 150 160 160 160 160 160 160 161 162 163 165 166 160 170 170 170 171 171 173 173 175 177 178 170 180 18Dec. 8, 2005 ECON 240A-6 L. PhillipsFinal0 190 190 200 20a. How many people with educational attainment from 7-11 years play the lottery?40b. How many people with educational attainment from 7-11 years do not play the lottery? 4c. Fill in the boxes in Table 2-2, the number of players and non-players for three categories of educational attainment.Table 2-2: Cross-Classification of Players and Non-Players by Educational LevelEducational Level in Years 7,8,9,10 & 11 12, 13, 14, & 15 16, 17, 18, 19 &20 MarginalPlayers 40 25 12 77Non-Players 4 2 17 23Marginal 44 27 29 100d. Fill in the expected numbers under the null hypothesis of independence between playing the lottery or not and educational level in Table 2-3Table 2-3: Number of Players and Non-Players By Educational Level Assuming IndependenceEducational Level in Years 7,8,9,10 & 11 12, 13, 14, & 15 16, 17, 18, 19 &20 MarginalPlayers 33.88 20.79 22.33Non-Players 10.12 6.21 6.67Marginal 100e. Fill in the contribution to Chi-Square in the six boxes in Table 2-4.Dec. 8, 2005 ECON 240A-7 L. PhillipsFinalTable 2-4: Contribution to Chi SquareEducational Level in Years 7,8,9,10 & 11 12, 13, 14, & 15 16, 17, 18, 19 &20Players 1.11 0.85 4.78Non-Players 3.70 2.85 16.0f. With the probability of a type I error equal to 5%, what is the critical level of Chi Square beyond which the sum of the six boxes in Table 2-3 will lead you to reject the null hypothesis of independence? There are two degrees of freedom. The critical value of chi square for twodegrees of freedom above which 5 % of the distribution lies is 5.99.3.(30) The Challenger data used for Takehome Project I showed launch temperatures ranging from 53 degrees Fahrenheit to 81 degrees Fahrenheit. This temperature range can be divided into approximately three equal ranges, 530-610, 620-710, and 720-810. Three dummy variables were created for each temperature range, DUMLOW, DUMMED, and DUMHIGH. The number of o-ring failures per launch was regressed against these three dummy variables.Dec. 8, 2005 ECON 240A-8 L. PhillipsFinalThe data for number of failed o-rings per launch, launch temperature, and the three dummy variables is displayed in Table 3-1.Table 3-1: Number of O-Ring Failures Per Launch and Launch TemperatureORINGS TEMP DUMLOW DUMMED DUMHIGH3 53 1 0 01 57 1 0 01 58 1 0 01 63 0 1 00 66 0 1 00 67 0 1 00 67 0 1 00 67 0 1 00 68 0 1 00 69 0 1 01 70 0 1 01 70 0 1 00 70 0 1 00 70 0 1 00 72 0 0 10 73 0 0 12 75 0 0 10 75 0 0 10 76 0 0 10 76 0 0 10 78 0 0 10 79 0 0 10 80 0 0 10 81 0 0 1The regression results from regressing the number of failed o-rings per launch against these three dummy variables for temperature range are displayed in Table 3-2.Table 3-2: Regression of Number of Failed O-Rings Per Launch Versus Dummy Variable for Temperature Range (Low, Medium, and High)Dependent Variable: ORINGSMethod: Least SquaresSample: 1 24Included observations: 24Variable Coefficient Std. Error t-Statistic Prob. DUMLOW 1.666667 0.366201 4.551239 0.0002Dec. 8, 2005 ECON 240A-9 L. PhillipsFinalDUMMED 0.272727 0.191242 1.426084 0.1685DUMHIGH 0.200000 0.200576 0.997126 0.3301R-squared 0.389266 Mean dependent var 0.416667Adjusted R-squared 0.331101 S.D. dependent var 0.775532S.E. of regression 0.634278 Akaike info criterion 2.043810Sum squared resid 8.448485 Schwarz criterion 2.191067Log likelihood -21.52572 F-statistic 6.692432Durbin-Watson stat 2.006082 Prob(F-statistic) 0.005642a. What statistical method is being employed using this regression? ANOVAb. Is this regression significant? Explain Yes, F-stat = 6.69 significant at 5 % level.c. What null hypothesis is being tested? That the three coefficients, i.e. three means, are equal.d. What is the interpretation of the regression coefficients on each of the dummy variables? Each is the mean number of o-ring failures per launch for the three temperature ranges.e. Why isn’t a constant term included with this regression? Because the three dummy variables add up to a
View Full Document