Berkeley STAT 135 - Bootstrap Example and Sample Code - D1973779

Home> Schools> University of California, Berkeley> Statistics (STAT) > STAT 135> Bootstrap Example and Sample Code

DOC PREVIEW

Berkeley STAT 135 - Bootstrap Example and Sample Code

School name University of California, Berkeley

Course Stat 135- Concepts of Statistics

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

U.C. Berkeley — Stat 135 : Concepts of StatisticsBootstrap Example and Sample Code1 Bootstrap ExampleThis section will demonstrate how the bootstrap can be used to generate confidence intervals.Supp ose we have a sample of data from an exponential distribution with parameter λ, i.e. our data comefrom a distribution with densityf(x|λ) = λe−λx.Recall that the MLE is given byˆλ = 1/¯X and thatˆλ has asymptotic variance equal to λ2/n. Since we knowthat the MLE is asymptotically normally distributed, we can form a (1 − α)100% confidence interval for λby using our e stimate of the asymptotic variance based on our MLE and taking [ˆλ − z(1 − α/2)ˆλ/√n,ˆλ +z(1 − α/2)ˆλ/√n].Here is an example with R code generating the sample, calculating the MLE, and computing a 95% confidenceinterval for λmy.Test.Data<-rexp(100,3)mean(my.Test.Data)#[1] 0.30125: AN example of what happened1/mean(my.Test.Data)#[1] 3.32empirical.lambda<-1/mean(my.Test.Data)#Normal theory quantiles:CIlow<-empirical.lambda-qnorm(0.975)*empirical.lambda/sqrt(100)CIhigh<-empirical.lambda+qnorm(0.975)*empirical.lambda/sqrt(100)#[2.67, 3.97]We see that we generated a sample of size 100 from the exponential distribution with parameter λ = 3.The mean of our data set is 0.301, and therefore our maximum likelihood estimate for λ is 3.32. Since weknow the MLE is asymptotically normal, we can construct a confidence interval using our estimate of theasymptotic variance of our MLE and the quantiles of the standard normal distribution. In this case, we get[2.67, 3.97] for our confidence interval.But suppose we were not working with the MLE, or suppose we did not want to use the asymptotic distri-bution to form our confidence intervals. Then we could use the bootstrap to estimate the distribution ofˆλand create bootstrap confidence intervals for λ.First, we form our set of bootstrap estimates of our parameter by generating B random samples of sizen = 100 from the exponential distribution with parameterˆλ and using each of these samples to get a newestimate of our model parameter,ˆλ(b).1In general, we don’t necessarily know anything about the distribution of the new parameter estimatesˆλ(b).But, although we may not have a perfect idea of the shape of this distribution, we can calculate quantilesq∗(α) from it, where we have q∗(α) =ˆλ(b)such that a fraction α of the other bootstrap parameter estimatesare less than or equal toˆλ(b). [For example, if we had 1000 bootstrap samples, the quantile q∗(0.05) wouldbe the 50th largest observation.] This means we can writeP(q∗(α/2) ≤ˆλ(b)≤ q∗(1 − α/2)) = 1 − αNow suppose we want to look at the distribution ofˆλ(b)−ˆλ. From the expression above, we can seeP(q∗(α/2) −ˆλ ≤ˆλ(b)−ˆλ ≤ q∗(1 − α/2) −ˆλ) = 1 − αIn addition, we can argue that we can estimate the distribution ofˆλ −λ by the distribution ofˆλ(b)−ˆλ. Thismakes sense if you think of the analogy thatˆλ arose from sampling from a distribution with parameter λ,whileˆλ(b)arose in exactly the same way from sampling from a distribution from parameterˆλ. This meanswe can sayP((q∗(α/2) −ˆλ ≤ˆλ(b)−ˆλ ≤ q∗(1 − α/2) −ˆλ) = P((q∗(α/2) −ˆλ ≤ˆλ − λ ≤ q∗(1 − α/2) −ˆλ) = 1 − αNow we can find our confidence interval for λ by rearranging terms to getP(2ˆλ − q∗(1 − α/2) ≤ λ ≤ 2ˆλ − q∗(α/2)) = 1 − α.This means our bootstrap confidence interval for λ is [2ˆλ − q∗(1 − α/2), 2ˆλ − q∗(α/2)].Now some code demonstrating how to find a 95% confidence interval for λ. Look at each line of code carefullyto see what it is doing.#first, initialize a matrix that will receive the values of the#estimate from each sampleboot.sampling.dist<-matrix(1,2000)#Now create 2000 bootstrap samples and compute the value of the stat for each of themfor (i in 1:2000){boot.sampling.dist[i]<-1/mean(rexp(100,empirical.lambda))}#look at the sampling distribution of the stat, according to parametric bootstrap:windows()hist(boot.sampling.dist,main="Estimate of sampling distribution of lambda",breaks=50)#find the quantiles of this distributionmy.quantiles<-quantile(boot.sampling.dist,c(.025,0.975))#calculate the bootstrap confidence interval boundariesCIbootlow<-2*empirical.lambda-my.quantiles[2]CIboothigh<-2*empirical.lambda-my.quantiles[1]2One more thing that the collection of bootstrap estimated parameters can be used for is to calculate anestimate of the standard error ofˆλ. The standard error ofˆλ can be estimated by the sample standard errorof the bootstrap parameters:boot.estimate.se<-sqrt(var(boot.sampling.dist))2 Some Helpful CommandsWorking with DistributionsThere are a set of four functions that are defined for almost any distribution that you will encounter onyour homework. I will show you examples for the normal distribution, but analogous functions exist forother distributions as well, such as the exponential (used above), the binomial, etc. These four functions arernorm(), pnorm(), qnorm() and dnorm(). Each of them takes a series of arguments.• rnorm() is used to generate a set of random variables sampled from a normal distribution of your choice.You pass it three arguments, in this order: n, the number of observations you want it to generate; µthe mean of the distribution you want it to sample from; σ2, the variance of the distribution you wantit to sample from. [Note that the number of arguments may vary if you are working with a differentdistribution that has a different number of parameters. For example, we saw above that rexp() onlytakes two parameters.]• pnorm() is used to give the cumulative density of the distribution you are working with. You pass itthree arguments: q, the quantile below which you want the cumulative probability; and the parametersof the distribution you are working with. For example pnorm(0, 0, 1) = 0.5, since half the area of thestandard normal distribution is below 0.• qnorm() is used to calculate a quantile, if you know the cumulative probability. You pass it threearguments: p, the cumulative probability; and the parameters of the distribution you are workingwith. For example, qnorm(0.5, 3, 1) = 3 since 3 is the 0.5th quantile of the normal distribution withmean 3 and variance 1.• dnorm() is used to calculate the value of the density function at a given point. You pass it three argu-ments: x, the point at which you

View Full Document