Dec. 11, 2008 ECON 240A-1 L. PhillipsFinalAnswer all five questions1. (30) Using Titanic3, the data set for Project 1, 809 people perished and 500 people survived for atotal of 1309. Figure 1-1 is a plot of the histogram for fare, in pounds sterling, for the people whoperished, and Figure 1-2 is a plot of the histogram for fare for the people who survived. 01002003004000 40 80 120 160 200 240Series: FARESample 1 809Observations 808Mean 23.35383Median 10.50000Maximum 263.0000Minimum 0.000000Std. Dev. 34.14510Skewnes s 4.248942Kurtosis 24.56307Jarque-Bera 18085.07Probability 0.000000Figure 1-1: Histogram of Fare, in Pounds Sterling, for People Who Perished 0501001502002500 50 100 150 200 250 300 350 400 450 500Series: FARESample 810 1309Observations 500Mean 49.36118Median 26.00000Maximum 512.3292Minimum 0.000000Std. Dev. 68.64880Skewness 3.518007Kurtosis 19.73316Jarque-Bera 6864.669Probability 0.000000Figure 1-2: Histogram of Fare, in Pounds Sterling, of People who Surv iveda. Looking at two measures of central tendency, does fare appear to distinguish the two groups, those who survived and those who perished? The mean and median fare for those who survived are both more than twice the amount of the mean and median fare for those who perished.Dec. 11, 2008 ECON 240A-2 L. PhillipsFinalb. The statistics accompanying these two figures provide two separate measures of dispersion. Name these two measures and explain which you think is the most reliable and why that is the case. . Standard deviation and range. The standard deviation is likely more reliable since the range, i.e. the max – min can be influenced by outliers, especially the max in this casec. A measure of relative dispersion, different from the two quantitative measures referred to inpart b, is the sample coefficient of variation. Calculate and report this measure for those who perished __1.46______ and for those who survived __1.39______. Does there appear to be much of a difference in relative dispersion between survivor fare and perished fare? Yes or no __ no ______.d. For which group is the fare distribution most skewed, perished or survived? _ perished _.What is used as a measure of skewness, the first central moment? Explain.The distributions of fare for both those who survived and those who did not are quite non-normal. There were 127 women who perished. Their distribution for fare in pounds sterling is displayed in the Box plot, Figure 1-3:Figure 1-3: Box Plot of Fare in £ for 127 Women who Perished on the Titanice. Comment on a salient feature of the distribution displayed in figure 1-3. _outliers_____2. (30) After looking at the data in question 1, a statistician has an inspired idea. Why not regress fare for both those who survived and those that perished against a constant and a dummy or indicator variable that is one if the individual perished and zero otherwise, and see what happens. The results are shown in Table 2-1.Table 2-1: Regression of Fare Against an Indicator Variable of Perished Or NotDependent Variable: FAREMethod: Least SquaresSample: 1 1309Included observations: 1308Excluded observations: 1Variable Coefficient Std. Error t-Statistic Prob.Dec. 11, 2008 ECON 240A-3 L. PhillipsFinalPERISHED -26.00735 2.856957 -9.103167 0.0000C 49.36118 2.245461 21.98265 0.0000R-squared 0.059666 Mean dependent var 33.29548Adjusted R-squared 0.058946 S.D. dependent var 51.75867S.E. of regression 50.21003 Akaike info criterion 10.67183Sum squared resid 3292487. Schwarz criterion 10.67975Log likelihood -6977.380 F-statistic 82.86765Durbin-Watson stat 1.836247 Prob(F-statistic) 0.000000The statistician notes the regression appears to be highly significant but is somewhat surprised to see the coefficient on the indicator variable is negative since the dependent variable fare is always non-negative. In any case, the statistician is not quite sure what this regression means and comes to you with a list of questions:a. What is the meaning of this regression and the fact that the F-statistic is significant, i.e. what is the null hypothesis that this regression is testing? The significance of the regression is that the average fare for those who survived is significantly different from the mean fare of those who perished, so reject the null of no differenceb. How should the statistician interpret the constant term? The constant is the mean fare for those who survived, see Figure 1-2. Fare = c + b*perished +e, E(Fare/perished =0) = c +0 +0 where left hand side is mean fare for survivors.c. Explain to the statistician why the coefficient on the indicator variable is negative and what this coefficient measures. E(fare/perished=1) = c +b+0, so c = mean fare forperished -mean fare for survivors and so b<0, see Fig. 1-1 and Fig. 1-2.d.1If the statistician questioned your explanation, how else could you demonstrate the validity of your answers? You could regress fare against two indicator variables with no constant, fare = b*survivor +d*perished + e and show that b was the mean fare for survivors and d the mean fare for those who perished., and run a Wald test on b=d and show that the F-stat is the same as for the regression. You could test the difference between means between two populations.e. You need not do the calculations but show the formula you would use from Ch. 13 supposing you chose that path to answer part d. t = [(x1 – x2)-(u1 –u2)]/(s12/n1 + s22/n2)1/23. (30) The Rapid Test also known as ELISA is used to determine whether someone has HIV, the virus that causes AIDS. Information from the Centers for Disease Control, CDC, can be found at www.cdc.gov/ hiv /resources/qa/oraqck.htm. The conditional probability of getting a positive test result for someone who does not have the virus, i.e. a false positive, is 0.027. The conditional probability of getting a negative test result for someone who does have the virus, i.e. a false negative is 0.080. On the basis of several characteristics, a doctor has assigned a patient as being low risk, i.e. as having a probability of having the virus of 0.005. Then the doctor and patient receive the Rapid Test result for this patient as being positive.Dec. 11, 2008 ECON 240A-4 L. PhillipsFinala. What the patient and his doctor both want to know is what is the probability that this patient actually has the HIV virus. (report to the third decimal place.)P(HIV/+) = P(HIV∩+)/P(+)b. Before answering part a,
View Full Document