DOC PREVIEW
UI STAT 5400 - Introduction to the Bootstrap

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

122S:166Introduction to the BootstrapLecture 8September 19, 2011Kate Cowles374 SH, [email protected]• Efron, B. (1982) The Jackknife, the Boot-strap, and Other Resampling Plans. Num-ber 38 i n CBMS-NSF Regional ConferenceSeries in Applied Mathematics. Philadelphia:SIAM.• Efron, B. and Tibshirani, R.J. (1993) AnIntroduction to the Bootstrap. New York:Chapman & Hall.• Davison, A.c. and Hinkley, D.V. (1997) Boot-strap Methods and their Application, NewYork: Cambri dge University Press.• materials listed under Web Resources3Review concepts• suppose we have one sample of n data values:y1, . . . , yn• sample values considered outcomes of i.i.d.random variabl es Y1, . . . , Yn• probab ility density function (pdf) or proba-bility mass function (pmf) f• cumulative distribution function (cdf) F• sample will be used to make inference– about population characteristic θ– using statistic T whose value in sample ist• questions of interest regardin g T– bias?– standard error?– quantiles?– how to compute confidence limits for θ?4– likely values under a null hypothesis of in-terest?5Two classes of statistical methods• parametric– particular mathematical model for behav-ior of random varia bles Yj– pdf or pmf f is completely determined byvalu es of un know n parameters ψ– quantity of interest in statistical an a lysisθ is a component or function of ψ• nonpa ra metric– uses only the fact the Yjs are i.i.d.– no mathematical model for their distribu-tion– (may be useful to do a nonparametericanalysis even if a reasonable parametricmodel exists)∗ to assess sensitivity of con c lusions to as-sumptions of parametric mo del6The empirical distribution• puts probability mass1nat each sample valueyj• empirical distribu ti on function (edf) orˆF– nonparametric mle of F– sample proportionˆF (y) =#{yj≤y}n∗ where # denotes the number of items ina set• edf plays role of fitted mod el when no math-ematical form is assumed for F7Example of edf> library(QRMlib)> help(edf)> data <- sort(rnorm(100) )> plot( data, edf(data), type = "s" )> qs <- seq(-2.5,2.5,by=0.005)> lines( qs, pnorm(qs), lty = 2 )8Example for the nonparametric bootstrap:City population data• for each of n = 49 U.S. cities, two data values– uj= population in 1920 (in 1000s)– xj= population in 1930 (in 1000s)• popula ti on of interest is all U.S. cities• the 49 cities a re assumed to be a simple ran-dom sample from this population• define (U,X) as pair of po pulation values fora randomly selected city• then if we knew θ =E(X)E(U)and th e total 1920population for the U.S., we could estimatethe total 1930 population of U.S.• want to estimate θ without assuming anyparametric model for X and U• sample-based statistic is T =¯X¯U9• observations 1 to 10 of this dataset are in-cluded with the boot package for R10> library(boot)> data(city)> cityu x1 138 1432 93 1043 61 694 179 2605 48 756 37 637 29 508 23 489 30 11110 2 5011The non-parametric bootstrap• goal: to get an idea of the sampli ng distribu-tion of the statistic T under repeated sam-pling from the population of interest• basic idea: our sample d ata gives us all theinformation we have about the whole popu-lation• steps:1. calculate statisti c of interest (call itˆθ) fromdataset as a whole2. fit edfˆF3. Draw a “bootstrap sample” fromˆF andcalculate statistic of interest on bootstrapsample– i.e., draw a sample of size n from originaldataset with replacement– Y∗1, Y∗2, . . . , Y∗n∼ˆF–ˆθ∗=ˆθ(Y∗1, Y∗2, . . . , Y∗n)124. repeat step 2 independently a large num-ber B of times obtaining bootstrap repli-cationsˆθ∗1,ˆθ∗2, . . . ,ˆθ∗B5. Use bootstrap replicatio ns to:– estimate standard error ofˆθ– estimate bias– obtain confidence interval13Using the R sample function to drawbootstrap samplessample package:base R DocumentationRandom Samples and PermutationsDescription:’sample’ takes a sample of the specified size from theelements of ’x’ using either with or without replacement.Usage:sample(x, size, replace = FALSE, prob = NULL)Arguments:x: Either a (numeric, complex, character or logical)vector of more than one element from which to choose,or a positive integer.size: non-negative integer giving the number of items tochoose.replace: Should sampling be with replacement?prob: A vector of probability weights for obtaining the14elements of the vector being sampled.> x <- seq(1:25)> x[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 20 21 22 23 24 25> sample(x, 25)[1] 2 20 3 9 6 8 15 10 23 1 19 25 12 21 14 4 13 2417 5 11 18 7 22 16> sample(x, 25, replace = TRUE)[1] 4 6 16 11 21 17 6 12 5 8 15 19 23 16 15 20 18 1921 5 25 7 8 20 3> mindex <- sample(1:10, replace=T)> mindex[1] 4 9 1 9 10 9 6 5 3 3> city[mindex, ]u x4 179 2609 30 1111 138 1439.1 30 11110 2 509.2 30 1116 37 635 48 753 61 69153.1 61 6916Bias correction using the bootstrap• notation– θ – true and unknown population quantityvalu e–ˆθ – estimate of θ based on sample data–ˆθ∗b– estimate of θ from b-th bootstrapsample17Bias correction continued• So in a sense:–ˆθ∗s are toˆθ asˆθ is to θ• bootstrap esimate of bias– Note: bia s = EF(ˆθ − θ)dbiasboot=1BBXb=1ˆθ∗b−ˆθ=ˆθ∗.−ˆθ• So bias-corrected point estimate is˜θ =ˆθ −ˆθ∗.−ˆθ= 2ˆθ −ˆθ∗.18R code for the City Data> library(boot)> help(boot, package="boot")------------------------------------------------------------------------------boot package:boot R DocumentationBootstrap ResamplingDescription:Generate ’R’ bootstrap replicates of a statistic applied to data.Both parametric and nonparametric resampling are possible. Forthe nonparametric bootstrap, possible resampling methods are theordinary bootstrap, the balanced bootstrap, antitheticresampling, and permutation. For nonparametric multi-sampleproblems stratified resampling is used. This is specified byincluding a vector of strata in the call to boot. Importanceresampling weights may be specified.Usage:boot(data, statistic, R, sim="ordinary", stype="i",strata=rep(1,n), L=NULL, m=0, weights=NULL,ran.gen=function(d, p) d, mle=NULL, ...)Arguments:data: The data as a vector, matrix or data frame. If it is amatrix or data frame then each row is considered as onemultivariate observation.statistic: A function which when applied to data returns a vectorcontaining the


View Full Document

UI STAT 5400 - Introduction to the Bootstrap

Documents in this Course
Load more
Download Introduction to the Bootstrap
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to the Bootstrap and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to the Bootstrap 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?