DOC PREVIEW
UW STAT 220 - Study Guide

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 220 – Summer 2009, Quiz 3. Maximum score: 100 points.KEY1. (25 pts) (from the textbook “Applied Statistics”, Devore and Farnum 2004) In some geographical region, engineers measured the duration of lightning flashes. On a random sampleof 110 flashes, the sample mean was 0.81 seconds and the sample SD was 0.34 seconds. a. Can you calculate a 90% confidence-interval for the length of a single lightning flash in that region? If yes, do the calculation to 2 digits after the decimal point. If no, why not or what additional information would you need? Short answer: NO. The shape of the distribution is not given. For example, if they told us it’s normal, then average+SD would have been enough to make statements about single flashes. But we were told nothing. In fact, there are indications that the distribution is right-skewed (never mind the details).More generally, an interval for a single, not-yet-seen observation (if we’ve already seen it then there’s no uncertainty) is known as a prediction interval. This is like weather-forecasting problem. When calculating this interval we need to know the distribution’s shape - and take into account both our uncertainty about the distribution (e.g., about the precision of the sample average+SD), and the uncertainty of a single observation. We won’t learn to do that in this course.b. Can you calculate a 90% confidence-interval for the average length of lightning flashesin that region? If yes, do the calculation to 2 digits after the decimal point. If no, why not or what additional information would you need? Here we can do it. By virtue of the CLT and the pretty big sample size of 110, we can safely assume that the sample average behaves like a normal r.v. Meaning that it is removed from the true (population) average by some unknown, but normal-looking chance error. The SE gives us the SD of this chance error.032.011034.0SampleSizeSampleSDSECalculator note: in general, we do not want to round in the middle of calculations. I wrote 0.032 above for simplicity, but what you really need to do is store the exact value (something like 0.032418…) in your calculator’s memory, and keep it for the next calculation. Otherwise, your rounding error may increase from step to step until your final answer is quite a bit off. Now, for a 90% CI we need a multiplier of 1.65 (more precisely 1.645, but never mind).  86.0,76.0032.065.181.065.1: SEageSampleAverCIIf the measurements were good and the sample was really random, then we are 90% confident that the true average lightning-flash length in the region where the measurements were taken is between 0.76 and 0.86.In other words: the method we used to calculate the interval (0,76,0.86) guarantees a 90% probability for catching the true lightning-flash average for the region. But we cannot tell whether we caught it or not.Stat 220 – Summer 2009, Quiz 3. Maximum score: 100 points.2. (15 pts) The 2004 “Lancet” study’s death estimate was based on 21 violent deaths recorded in the survey. The 2006 study’s much higher violent-death estimate was based on 300 violent deaths. Roughly speaking, about how many nationwide estimated deaths did each single recorded death in the 2004 study represent? How about the 2006 study? What does this tell us about the need to quality-control these studies’ data? (hint: this question requires you to providetwo additional numbers not given here, but by now you should be familiar with them)The two ‘missing’ numbers are approximately 100,000 total violent deaths estimated in 2004 (during 18 post-invasion months) and approximately 600,000 total violent deaths estimated in 2006 (during 40 post-invasion months). Dividing each estimate by the actual recorded deaths used to produce it, we get ratios of a little less than 5,000 for the 2004 study, and around 2,000 for the 2006 study.This means that every single recorded death in the survey had a pretty strong impact on the nationwide estimates. Therefore, the estimate was quite sensitive to data-entry and other errors at the survey level. The inspection by Guha-Sapir and Degomme (2007) found 35 deaths entered in error in the 2006 study. If their comments are accepted and no correction to the errors can be found, then the point estimate of 600,000 should automatically be lowered by about 70,000 deaths or so (not exactly, because the averaging process is a complicated regression; but the reduction is probably close to


View Full Document

UW STAT 220 - Study Guide

Download Study Guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?