Bret Larget Department of Statistics University of Wisconsin Madison October 7 2004 Confidence Intervals Statistics 371 Fall 2004 Statistical Estimation Statistical inference is inference about unknown aspects of a population based on treating the observed data as the realization of a random process We focus in this course on inference in the setting of random samples from populations Statistical estimation is a form of statistical inference in which we use the data to estimate a feature of the population and to assess the precision the estimate Chapter 6 introduces these ideas in the setting of estimating a population mean Statistics 371 Fall 2004 1 Typical Problem The following data set are the weights mg of thymus glands from five chick embryos after 14 days of incubation The data was collected as part of a study on development of the thymus gland thymus 1 29 6 21 5 28 0 34 6 44 9 If we model this data as having been sampled at random from a population of chick embryos with similar conditions what can we say about the population mean weight Statistics 371 Fall 2004 2 Standard Error of the Mean We know that SD of the sampling distribution of the sample mean y can be computed by this formula Y n But if we only observe sample data y1 yn we do not know the value of the population SD so we cannot use the formula directly However we can compute the sample standard deviation s which is an estimate of the population standard deviation The expression s SEY n is called the standard error of the sample mean and is an estimate of the standard deviation of the sampling distribution of the sample mean You can understand why statisticians gave this concept a shorter name Statistics 371 Fall 2004 3 Example cont Here is some R code to compute the mean standard deviation and standard error for the example data m mean thymus m 1 31 72 s sd thymus s 1 8 72909 n length thymus n 1 5 se s sqrt n se 1 3 903767 The sample standard deviation is an estimate of how far individual values differ from the population mean The standard error is an estimate of how far sample means from samples of size n differ from the population mean Statistics 371 Fall 2004 4 Confidence intervals The basic idea of a confidence interval for is as follows We know that the sample mean y is likely to be close within a few multiples of n to the population mean Thus the unknown population mean is likely to be close to the observed sample mean y We can express a confidence interval by centering an interval around the observed sample mean y those are the possible values of that would be most likely to produce a sample mean y Statistics 371 Fall 2004 5 Derivation of a Confidence Interval From the sampling distribution of Y we have the following statement 0 9 Pr z Y z n n if we let z 1 645 because the area between 1 645 and 1 645 under a standard normal curve is 0 9 Different choices of z work for different confidence levels The first inequality is equivalent to Y z n and the second is equivalent to Y z n which are put together to give Pr Y z Y z n n 0 9 Statistics 371 Fall 2004 6 Derivation of a Confidence Interval This recipe for a confidence interval is then Y z n This depends on knowing If we don t know as is usually the case we could use s as an alternative However the probability statement is then no longer true We need to use a different multiplier to account for the extra uncertainty This multiplier comes from the t distribution Statistics 371 Fall 2004 6 Sampling Distributions y Z n y T s n If the population is normal the statistic Z has a standard normal distribution If the population is not normal but n is sufficiently large the statistic Z has approximately a standard normal distribution by the Central Limit Theorem The distribution of the statistic T is more variable than that of Z because there is extra randomness in the denominator The extra randomness becomes small as the sample size n increases Statistics 371 Fall 2004 7 Student s t Distribution If Y1 Yn are a random sample from any normal distribution and if Y and S are the sample mean and standard deviation respectively then the statistic T Y S n is said to have a t distribution with n 1 degrees of freedom All t distributions are symmetric bell shaped distributions centered at 0 but their shapes are not quite the same as normal curves and they are spread out a more than the standard normal curve The spread is largest for small sample sizes As the sample size and degrees of freedom increases the t distributions become closer to the standard normal distribution The Table in the back cover of your textbook provides a few key quantiles for several different t distributions Statistics 371 Fall 2004 8 The t Distributions in R The functions pt and qt find areas and quantiles of t distributions in R The area to the right of 2 13 under a t distribution with 4 degrees of freedom is 1 pt 2 27 4 1 0 04286382 To find the 95th percentile of the t distribution with four degrees of freedom you could do the following qt 0 95 df 4 1 2 131847 This R code cecks the values of the 0 05 upper tail probability for the first several rows of the table round qt 0 95 df 1 10 3 1 6 314 2 920 2 353 2 132 2 015 1 943 1 895 1 860 1 833 1 812 You can use R to find values not tabulated qt 0 95 77 1 1 664885 Statistics 371 Fall 2004 9 Mechanics of a confidence interval A confidence interval for takes on the form s Y t n where t is selected so that the area between t and t under a t distribution curve with n 1 degrees of freedom is the desired confidence level In the example there are df n 1 4 degrees of freedom A 90 confidence interval uses the multiplier t 2 132 A 95 confidence interval would use t 2 776 instead We are 90 confident that the mean thymus weight in the population is in the interval 31 72 8 32 or 23 4 40 04 We are 95 confident that the mean thymus weight in the population is in the interval 31 72 10 84 or 20 88 42 56 Statistics 371 Fall 2004 10 Mechanics of a confidence interval Notice that these multipliers 2 132 and 2 776 are each greater than the corresponding z multipliers 1 645 and 1 96 Had the sample size been 50 instead of 5 the t multipiers 1 677 and 2 01 would still be larger than the corresponding z but by a much smaller amount Statistics 371 …
View Full Document