Unformatted text preview:

Table of contentsOutlineThe jackknifeThe bootstrap principleThe bootstrapLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapLecture 12Brian CaffoDepartment of BiostatisticsJohns Hopkins Bloomberg School of Public HealthJohns Hopkins UniversityAugust 23, 2007Lecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapTable of contents1 Table of contents2 Outline3 The jackknife4 The bootstrap principle5 The bootstrapLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapOutline1 The jackknife2 Introduce the bootstrap principle3 Outline the bootstrap algorithm4 Example bootstrap calculations5 DiscussionLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapThe jackknife•The jackknife is a tool for estimating standard errors andthe bias of estimators•As its name suggests, the jackknife is a small, handy tool;in contrast to the bootstrap, which is then the moralequivalent of a giant workshop full of tools•Both the jackknife and the bootstrap involve resamplingdata; that is, repeatedly creating new data sets from theoriginal dataLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapThe jackknife•The jackknife deletes each observation and calculates anestimate based on the remaining n − 1 of them•It uses this collection of estimates to do things likeestimate the bias and the standard error•Note that estimating the bias and having a standard errorare not needed for things like sample means, which weknow are unbiased estimates of population means andwhat their standard errors areLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapThe jackknife•We’ll consider the jackknife for univariate data•Let X1, . . . , Xnbe a collection of data used to estimate aparameter θ•Letˆθ be the estimate based on the full data set•Letˆθibe the estimate of θ obtained by deletingobservation i•Let¯θ =1nPni =1ˆθiLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapContinued•Then, the jackknife estimate of the bias is(n − 1)¯θ −ˆθ(how far the average delete-one estimate is from theactual estimate)•The jackknife estimate of the standard error is"n − 1nnXi =1(ˆθi−¯θ)2#1/2(the deviance of the delete-one estimates from the averagedelete-one estimate)Lecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapExample•Consider the data set of 630 measurements of gray mattervolume for workers from a lead manufacturing plant•The median gray matter volume is around 589 cubiccentimeters•We want to estimate the bias and standard error of themedianLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapExampleThe gist of the coden <- length(gmVol)theta <- median(gmVol)jk <- sapply(1 : n,function(i) median(gmVol[-i]))thetaBar <- mean(jk)biasEst <- (n - 1) * (thetaBar - theta)seEst <- sqrt((n - 1) * mean((jk - thetaBar)^2))Lecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapExampleOr, using the bootstrap packagelibrary(bootstrap)out <- jackknife(gmVol, median)out$jack.seout$jack.biasLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapExample•Both methods (of course) yield an estimated bias of 0 anda se of 9.94•Odd little fact: the jackknife estimate of the bias for themedian is always 0 when the number of observations iseven•It has been shown that the jackknife is a linearapproximation to the bootstrap•Generally do not use the jackknife for sample quantiles likethe median; as it has been shown to have some poorpropertiesLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapPseudo observations•Another interesting way to think about the jackknife usespseudo observations•LetPseudo Obs = nˆθ − (n − 1)ˆθi•Think of these as “whatever observation i contributes tothe estimate of θ”•Note whenˆθ is the sample mean, the pseudo observationsare the data themselves•Then the sample standard error of these observations isthe previous jackknife estimated standard error.•The mean of these observations is a bias-correctedestimate of θLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapExample: Tom’s notesLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapThe bootstrap•The bootstrap is a tremendously useful tool forconstructing confidence intervals and calculating standarderrors for difficult statistics•For example, how would one derive a confidence intervalfor the median?•The bootstrap procedure follows from the so calledbootstrap principleLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapThe bootstrap principle•Suppose that I have a statistic that estimates somepopulation parameter, but I don’t know its samplingdistribution•The bootstrap principle suggests using the distributiondefined by the data to approximate its samplingdistributionLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapThe bootstrap in practice•In practice, the bootstrap principle is always carried outusing simulation•We will cover only a few aspects of bootstrap resampling•The general procedure follows by first simulating completedata sets from the observed data with replacement•This is approximately drawing from the samplingdistribution of that statistic, at least as far as the data isable to approximate the true population distribution•Calculate the statistic for each simulated data set•Use the simulated statistics to either define a confidenceinterval or take the standard deviation to calculate astandard errorLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrapExample•Consider again, the data set of 630 measurements of graymatter volume for workers from a lead manufacturing plant•The median gray matter volume is around 589 cubiccentimeters•We want a confidence interval for the median of thesemeasurementsLecture 12Brian CaffoTable ofcontentsOutlineThe jackknifeThe bootstrapprincipleThe bootstrap•Bootstrap procedure for calculating for the median from adata set of n observationsi. Sample n


View Full Document

Bloomberg School BIO 651 - lecture 12

Documents in this Course
Load more
Download lecture 12
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture 12 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 12 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?