UW-Madison STAT 371 - Statistical Estimation - D1585575

Home> Schools> University of Wisconsin, Madison> Statistics (STAT) > STAT 371> Statistical Estimation

DOC PREVIEW

UW-Madison STAT 371 - Statistical Estimation

School name University of Wisconsin, Madison

Course Stat 371- Intro to Statistics

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Confidence IntervalsBret LargetDepartment of StatisticsUniversity of Wisconsin - MadisonOctober 7, 2004Statistics 371, Fall 2004Statistical Estimation• Statistical inference is inference about unknown aspects ofa population based on treating the observed data as therealization of a random process.• We focus in this course on inference in the setting ofrandomsamplesfrom populations.•Statistical estimation is a form of statistical inference inwhich we use the datato estimate a feature of the populationandto assess the precision the estimate.• Chapter 6 introduces these ideas in the setting of estimatinga population mean µ.Statistics 371, Fall 2004 1Typical ProblemThe following data set are the weights (mg) of thymus glandsfrom five chick embryos after 14 days of incubation.The data was collected as part of a study on development of thethymus gland.> thymus[1] 29.6 21.5 28.0 34.6 44.9If we m odel this data as having been sampled at random froma population of chick embryos with similar conditions, what canwe say about the population mean weight?Statistics 371, Fall 2004 2Standard Error of the Mean• We know that SD of the sampling distribution of the samplemean ¯y can be computed by this formula.σ¯Y=σ√n• But if we only observe sample data y1, . . . , yn, we do notknow the value of the population SD σ, so we cannot usethe formula directly.• However, we can compute the sample standard deviation s,which is an estimate of the population standard deviation σ.• The expressionSE¯Y=s√nis called thestandard error of the sample mean and isan estimate ofthe standard deviation of the samplingdistribution of the sample mean. (You can understand whystatisticians gave this concept a shorter name.)Statistics 371, Fall 2004 3Example (cont.)• Here is some R code to compute the mean, standarddeviation, and standard error for the example data.> m = mean(thymus)> m[1] 31.72> s = sd(thymus)> s[1] 8.72909> n = length(thymus)> n[1] 5> se = s/sqrt(n)> se[1] 3.903767• The sam ple standard deviation is an estimate of how farindividual values differ from the population mean.• Thestandard error is an estimate of how far sample meansfrom samples of size n differ from the population mean.Statistics 371, Fall 2004 4Confidence intervalsThe basic idea of a confidence interval for µ is as follows.• We know that the sample mean ¯y is likely to be close (withina few multiples of σ/√n) to the population mean µ.• Thus, the unknown population mean µ is likely to be closeto the observed sample mean ¯y.• We can express a confidence interval by centering an intervalaround the observed sample mean ¯y — those are the possiblevalues of µ that would be most likely to produce a samplemean ¯y.Statistics 371, Fall 2004 5Derivation of a Confidence IntervalFrom the sampling distribution of¯Y , we have the followingstatementPr(µ − zσ√n≤¯Y ≤ µ − zσ√n)= 0.9if we let z = 1.645, because the area between −1.645 and 1.645under a st andard normal curve is 0.9. Different choices of z workfor different confidence levels.The first inequality is equivalent toµ ≤¯Y + zσ√nand the second is equivalent to¯Y − zσ√n≤ µwhich are put together to givePr(¯Y − zσ√n≤ µ ≤¯Y + zσ√n)= 0.9Statistics 371, Fall 2004 6Derivation of a Confidence IntervalThis recipe for a confidence interval is then¯Y ± zσ√n• This depends on know ing σ.• If we don’t know σ as is usually the case, we could use s asan alternative.• However, the probability statement is then no longer true.• We need to use a different multiplier to account for t he extrauncertainty.• This multiplier comes from thet distribution.Statistics 371, Fall 2004 6Sampling DistributionsZ =¯y − µσ.√nT =¯y − µs.√n• If thepopulation is normal, the statistic Z has a standardnormal distribution.• If thepopulation is not normal but n is sufficiently large, thestatistic Z hasapproximately a standard normal distribution(by the Central Limit Theorem).• The distribution of the statistic T is more variable than thatof Z because there is extra randomness in the denominator.• The extra randomness becomes small as the sample size nincreases.Statistics 371, Fall 2004 7Student’s t Distribution• If Y1, . . . , Ynare a random sample from any normal distri-bution and if¯Y and S are the sample mean and standarddeviation, respectively, then the statisticT =¯Y − µS.√nis said to have at distribution with n −1 degrees of freedom.• All t distributions aresymmetric, bell-shaped, distributionscentered at 0, but their shapes are not quite the same asnormal curves and they are spread out a more than thestandard normal curve.• The spread is largest for small sample sizes. As the samplesize (and degrees of freedom) increases, the t distributionsbecome closer to the standard normal distribution.• The Table in the back cover of your textbook provides a fewkey quantiles for several different t distributions.Statistics 371, Fall 2004 8The t Distributions in R• The functions pt and qt find areas and quantiles of tdistributions in R.• The area to the right of 2.13 under a t distribution with 4degrees of freedom is> 1 - pt(2.27, 4)[1] 0.04286382• To find the 95th percentile of the t distribution with fourdegrees of freedom, you could do the following.> qt(0.95, df = 4)[1] 2.131847• This R code cecks the values of the 0.05 upper tail probabilityfor the first several rows of the table.> round(qt(0.95, df = 1:10), 3)[1] 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812• You can use R to find values not tabulated.> qt(0.95, 77)[1] 1.664885Statistics 371, Fall 20049Mechanics of a confidence intervalA confidence interval for µ takes on the form¯Y ± t ×s√nwhere t is selected so t hat the area between −t and t under a tdistribution curve with n − 1 degrees of freedom is the desiredconfidence level.In the example, there are df = n − 1 = 4 degrees of freedom. A90% confidence interval uses the multiplier t = 2.132. A 95%confidence interval would use t = 2.776 instead.We are 90% confident that the mean thymus weight in thepopulation is in the interval 31.72 ± 8.32 or (23.4, 40.04).We are 95% confident that the mean thymus weight in thepopulation is in the interval 31.72 ± 10.84 or (20.88, 42. 56).Statistics 371, Fall 2004 10Mechanics of a confidence intervalNotice that these multipliers 2.132 and 2.776 are each greaterthan the corresponding z multipliers 1.645 and

View Full Document