1 22S 105 Statistical Methods and Computing 2 Confidence in estimation Example Studying the quantitative skills of young Americans of working age Confidence Intervals We might use the quantitative scores from the national Assessment of Educational Progress NAEP Young Adult Literacy Assessment Survey Lecture 12 Mar 4 2011 possible scores from 0 to 500 Kate Cowles 374 SH 335 0727 kcowles stat uiowa edu in a recent year 840 men aged 21 to 25 years were in NAEP sample can be considered a simple random sample from the population of 9 5 million young men in this age range mean quantitative score x 272 What can we conclude about the population mean score of all 9 5 million young men 3 4 Point estimation Interval estimation the prelude If we had to guess a single number for the population mean our best educated guess is x the sample mean Recall essentials about sampling distribution of x x is our point estimate of How great is the uncertainty in this estimate The mean x of 840 scores has a distribution that is close to normal by the Central Limit Theorem The mean of this normal sampling distribution is the same as the unknown mean of the entire population The standard deviation of x for a simple random sample of 840 men is 840 where is the standard deviation of individual NAEP scores among all young men 5 If we knew 6 Statistical confidence Imagine that we know that the true population standard deviation of quantitative scores among all young men is 60 The 68 95 99 7 rule says that in 95 of all samples the mean score x for the sample will be within two standard deviations of the population mean score Then the standard deviation of x is 60 2 1 n 840 So the x will be within 4 2 points of in 95 of samples of 840 NAEP scores But if x is within 4 2 points of the unknown then also has to be within 4 2 points of the observed x Imagine also that we could choose many samples of size 840 and find the mean NEAP quantitative score from each one This will happen in 95 of all samples That is in 95 of all possible samples of size 840 from this population If we collect all these different x s and display their distribution we get the normal distribution with the unknown lies between x 4 2 and x 4 2 mean equal to the unknown standard deviation 2 1 7 95 confidence Our sample of 840 young men gave x 272 We say that we are 95 confident that the unknown mean NAEP quantitative score for all young men lies between x 4 2 272 4 2 267 8 and x 4 2 272 4 2 276 2 Every sample would give slightly different values for this interval Why are we so confident that lies in the interval we happened to get 8 There are only two things that could have happened with our particular sample We got a sample such that the true does lie in our resulting interval That is really is between 267 8 and 276 2 We were unlucky and our simple random sample was one of the 5 of all possible samples where x is not within 4 2 points of the true We cannot know for sure which thing happened with our particular sample Saying We are 95 confident that the unknown lies in the interval 267 8 276 2 means We got these numbers by a method that gives correct results 95 of the time 9 10 What a 95 confidence interval does not mean What if we wanted to be more confident that our interval contained Saying We are 95 confident that the unknown lies in the interval 267 8 276 2 doesn not mean We would use a confidence level other than 95 is a random variable that has a value within the interval 95 of the time Example we will compute a 99 confidence interval for the mean of NAEP quantitative scores in young men 95 of the population values lie in the interval 11 We need the values for a standard normal distribution that cut off the top 0 005 and the bottom 0 005 of values Table A 1 gives several possibilities due to rounding The most accurate choice is 2 58 for the upper cutoff So a 99 confidence interval for would be x 2 58 2 1 x 2 58 2 1 12 If we didn t need to be all that confident how would we compute an 80 confidence interval for 13 Two sided confidence intervals for a population mean Draw a simple random sample of size n from a population having unknown mean known standard deviation 14 Critical values for the most commonly used confidence levels Confidence level Tail area z 90 0 05 1 645 95 0 025 1 960 99 0 005 2 576 A level C confidence interval for is x z n where z is the value that cuts off the upper 1 2 of 1 C of the area of a standard normal distribution z is called the critical value 15 What affects the width of a confidence interval The width of a confidence interval gets smaller if The confidence coefficient gets smaller equivalently if the level of confidence gets smaller gets smaller n gets larger 16 One sided confidence intervals What if we only need to be confident that is below some upper bound or above some lower bound but we don t care how far it might be in the opposite direction Example We are concerned that for the NAEP scores might be very low so we want to find a lower bound That is we want to find a value m such that we are 95 confident that m Begin by drawing the picture 17 Now we will use Table A to find the value that cuts off the lower 5 of the area under a standard normal curve This is 1 645 Therefore we are 95 confident that x 1 645 n In other words our one sided confidence interval for is x 1 645 n 272 1 645 2 1 268 55
View Full Document