Nov 2 2004 ECON 240A 1 Midterm L Phillips 1 15 The box plot for running times from a random sample of Boston Marathon runners is shown below Table 1 1 lists the data sorted in descending order 145 11 140 1 175 18 219 96 164 17 a Label the median on the plot with its numerical value b Label the first and third quartiles on the plot with their numerical values c The numerical value for the second quartile is also the numerical value for the median d Label the ends of the upper whisker and the lower whisker with their numerical values e How many outliers are there Zero What does it take to be an outlier Table 1 1 Random Sample of Running Times Boston Marathon 219 96 201 79 192 84 191 42 185 41 183 97 178 38 177 64 176 98 176 6 176 29 175 58 175 21 175 18 174 09 173 9 173 24 173 18 172 2 171 46 171 33 170 43 169 49 165 63 165 08 165 06 164 87 164 17 160 88 158 75 158 53 158 39 156 9 153 34 152 07 152 03 150 51 147 78 146 67 146 59 145 91 145 11 144 69 144 41 143 96 143 76 143 44 142 33 141 97 141 83 141 18 141 06 140 95 140 57 140 1 To lie beyond the third quartile plus 1 5 interquartile Range IQR or below the first quartile 1 5 IQR Nov 2 2004 ECON 240A 2 Midterm L Phillips 2 15 A random sample of heart attack victims can be classified as high income 30 medium income 49 and low income 21 respectively These heart attack victims were classified as either survivors or deceased Of those deceased 7 were high income 9 were medium income and 12 were low income a What is the joint probability of a sample member being low income and a heart attack survivor p S Low 0 09 0 21 0 12 b What is the conditional probability of a member being a survivor given that they are low income p S Low p S Low p Low 0 43 0 09 0 21 c Is this conditional probability part b higher or lower than the probability of a member being a survivor lower than 0 72 d What is the conditional probability of a member being a survivor given that the member is high income p S High p S High p High 0 23 0 30 0 77 e Does survival of a heart attack appear to be independent of income no Income group High Medium Low Survivor 0 23 0 40 0 09 0 72 Deceased 0 07 0 09 0 12 0 28 0 30 0 49 0 21 1 00 3 15 The owner of a low tech parking lot suspects her employee may be embezzling or skimming Based on the dollar receipts the employee provided the average time parked would be 3 5 hours For the same period as the receipts turned in the owner had the lot under surveillance and the following information on parking times was obtained The histogram of parking times is shown as Fig 31 The summary statistics for parking times is included as Table3 1 a What is the recommended range for the number of bins for a histogram for a data set this size 10 11 Nov 2 2004 ECON 240A 3 Midterm L Phillips b At a 1 level of significance do you think the employee is embezzling or not not null 3 5 alternative 3 5 t 627 dof x x 3 61 3 50 0 40 629 6 89 c What is the critical value in the distribution determining the probability of the type I error 2 33 d What distribution did you use in your answer to parts b and c Student st Why population variance not known e If the histogram of parking times is not normal does it affect your answer no the distribution of the sample mean is still normal note n 628 200 150 100 50 Bin Table 3 1 Summary Statistics Parking Times in Hours Mean 3 61 Standard Error 0 015934 Median 3 6 Mode 3 7 Standard Deviation 0 4 Sample Variance 0 16 Kurtosis 0 329372 Skewness 0 07103 Range 2 7 Minimum 2 Maximum 4 7 Sum 2271 7 Count 629 4 5 4 75 4 4 25 3 5 3 75 3 25 3 2 5 2 75 0 2 25 Frequency Figure 3 1 Histogram of Parking Times in Hours Nov 2 2004 ECON 240A 4 Midterm L Phillips 4 15 Describe in words why you could make errors in a estimating the population mean from a sample of random numbers generated from the normal distribution with mean zero and variance one Eventhough the population mean is known to be zero in this simulation of drawing a random sample since the sample mean is a random variable with its own distribution it is possible by chance to obtain a 95 confidence interval that does not include zero Recall Lab 3 b estimating the proportion of voters that will vote for Senator Boxer today based on a Field Poll taken three weeks ago Errors can arise by chance as in part a above In this case we are making an inference about an unknown parameter and we can make errors without knowing it In addition voter sentiment could have changed since the last poll c estimating the true average monthly rate of return on the UC Stock Index Fund from five years of monthly data used to calculate the sample mean of the monthly rate of return of this index The same arguments as in part a and all but the last sentence in part b Plus the unknown parameter we are trying to estimate may not be constant but instead time varying So we have to keep in mind our assumptions may be wrong 5 15 Employment in California in millions of persons is plotted against Real California Personal Income in millions of 2000 as illustrated in Figure 5 1 There appears to be diminishing returns so to speak i e the slope of the data points appears to decrease as the ratio of employment to real income decreases Note that a linear relationship does not fit the beginning or ending data points well Nov 2 2004 ECON 240A 5 Midterm L Phillips Figure 5 1 CA Em ploym ent Vs Real CA Personal Incom e 2000 1971 2003 18 16 2003 y 1E 05x 4 1866 R 2 0 9657 14 12 10 8 1971 6 4 2 0 0 200 000 400 000 600 000 800 000 1 000 000 1 200 000 R eal C A P erso nal Income M illio ns The ratio of employees per real dollar 2000 of personal income was calculated and plotted against time where time equals zero in 1971 and time equals 32 in 2003 This ratio of employee per 2000 dollar equals 21 per million dollars in …
View Full Document
Unlocking...