DOC PREVIEW
UI STAT 5400 - Computing in Statistics

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

22S:166 Computing in StatisticsSimulation studies in statisticsLecture 12October 2, 2006Based on a lecture by Marie Davidian forST 810A - Spring 2005Preparation for Statistical ResearchNorth Carolina State Universityhttp://www4.stat.ncsu.edu/ davidian/st810a/1Basics• simulation studies are commonly done to evaluate theperformance of a frequentist statistical procedure, or tocompare the performance of two or more differentprocedures for the same problem• enable us to see what happens “when many manysamples of the same size are drawn from the samepopulation”• properties of estimators that are often evaluated bysimulation– bias– mean squared error– coverage of confidence intervals• properties of hypothesis tests also can be evaluated bysimulation studies– size– power• simulation studies are experiments, and the things youknow about experimental design and sample sizecalculation apply2Terminology• simulation: a numerical technique for conductingexperiments on the computer• Monte Carlo simulation: a computer experimentinolving random sampling from probabilitydistributions– what statisticians usually mean by “simulations”3Rationale• Properties of statistical methods must be establishedbefore the methods can safely be used in practice.• But exact analytical derivations of properties are rarelypossible• Large sample approximations to properties are oftenpossible– evaluation of the relevance of the approximation to(finite) sample sizes likely to be encountered inpractice is needed• Analytical results may require assumptions such asnormality– What happens when these assumptions areviolated? Analytical results, even large sample ones,may not be possible4Questions to be addressed regarding anestimator or testing procedure• Is an estimator biased in finite samples? What is itssampling variance?• How does it compare to competing estimators on thebasis of bias, precision, etc.?• Does a procedure for constructing a confidence intervalfor a parameter achieve the claimed nominal level ofcoverage?• Does a hypothesis testing procedure attain the claimedlevel or size?• If so, what power is possible against differentalternatives to the null hypothesis? Do different testprocedures deliver different power?5Role of Monte Carlo simulation• Goal is to evaluate sampling distribution of anestimator under a particular set of conditions (samplesize, error distribution, etc.)• Analytic derivation of exact sampling distribution isnot feasible• Solution: Approximate the sampling distributionthrough simulation– Generate S independent data sets under theconditions of interest– Compute the numerical value of the estimator/teststatistic T (data) for each data set, yieldingT1, . . . , TS• If S is large enough, summary statistics acrossT1, . . . , TSshould be good approximations to the truesampling properties of the estimator/test statisticunder the conditions of interest6Simulation for properties of estimatorsSimple example: Compare three estimators for the mean µof a distribution based on i.i.d. draws Y1, . . . , Yn• Sample mean T(1)• Sample 20% trimmed mean T(2)• Sample median T(3)Remarks:• If the distribution of the data is symmetric, all threeestimators indeed estimate the mean• If the distribution is skewed, they do not7Simulation procedureFor a particular choice of µ, n, and true underlyingdistribution• Generate independent draws Y1, . . . , Ynfrom thedistribution• Compute T(1), T(2), T(3)• Repeat S times ⇒T(1)1, . . . , T(1)S; T(2)1, . . . , T(2)S; T(3)1, . . . , T(3)S• Compute for k = 1, 2, 3dmean = S−1SXs=1T(k)s=T(k),dbias = T(k)− µdSD =vuuuut(S − 1)−1SXs=1(T(k)s−T(k))2,dMSE = S−1SXs=1(T(k)s− µ)2≈dSD2+dbias28Relative efficiencyFor a particular choice of µ,Relative efficiency: For any estimators for whichE(T(1)) = E(T(2)) = µRE =var(T(1))var(T(2))is the relative efficiency of estimator 2 to estimator 1• When the estimators are not unbiased it is standard tocomputeRE =MSE(T(1))MSE(T(2))• In either case RE < 1 means estimator 1 is preferred(estimator 2 is inefficient relative to estimator 1 in thissense)9R code for example> set.seed(3)> S <- 1000> n <- 15> trimmean <- function(Y){mean(Y,0.2)}> mu <- 1> sigma <- sqrt(5/3)10Normal data:> out <- generate.normal(S,n,mu,sigma)> outsampmean <- apply(out$dat,1,mean)> outtrimmean <- apply(out$dat,1,trimmean)> outmedian <- apply(out$dat,1,median)> summary.sim <- data.frame(mean=outsampmean,trim=outtrimmean,+ median=outmedian)> results <- simsum(summary.sim,mu)> view(round(summary.sim,4),5)First 5 rowsmean trim median1 0.7539 0.7132 1.03892 0.6439 0.4580 0.37463 1.5553 1.6710 1.93954 0.5171 0.4827 0.41195 1.3603 1.4621 1.345211> resultsSample mean Trimmed mean Mediantrue value 1.000 1.000 1.000# sims 1000.000 1000.000 1000.000MC mean 0.985 0.987 0.992MC bias -0.015 -0.013 -0.008MC relative bias -0.015 -0.013 -0.008MC standard deviation 0.331 0.348 0.398MC MSE 0.110 0.121 0.158MC relative efficiency 1.000 0.905 0.69412Performance of estimates of uncertaintyHow well do estimated standard errors representthe true sampling variation?• E.g., For sample mean T(1)(Y1, . . . , Yn) =YSE(Y ) =s√n, s2= (n −1)−1nXj=1(Yj−Y )2• MC standard deviation approximates the truesampling variation• Compare average of estimated standarderrors to MC standard deviationFor sample mean: MC standard deviation0.331> outsampmean <- apply(out$dat,1,mean)> sampmean.ses <- sqrt(apply(out$dat,1,var)/n)> ave.sampmeanses <- mean(sampmean.ses)> round(ave.sampmeanses,3)[1] 0.32913Usual 100(1-α)% confidence interval forµ:Based on sample mean[Y − t1−α/2,n−1s√n,Y + t1−α/2,n−1s√n]• Does the interval achieve the nominal level ofcoverage 1 − α?• E.g. α = 0.05> t05 <- qt(0.975,n-1)> coverage <- sum((outsampmean-t05n*sampmean.ses <= mu) &(outsampmean+t05n*sampmean.ses >= mu))/S> coverage[1] 0.94914Simulations for properties of hypothesistestsSimple example: Size and power of the usualt-test for the meanH0: µ = µ0vs. H1: µ 6= µ0• To evaluate whether size/level of test achievesadvertised α generate data under µ = µ0andcalculate proportion of rejections of H0• Approximates the true probability ofrejecting H0when it is true• Proportion should ≈ α• To evaluate power, generate data under somealternative µ 6= µ0and calculate proportion ofrejections of H0• Approximates the true probability of rejectingH0when the alternative is true (power)• If


View Full Document

UI STAT 5400 - Computing in Statistics

Documents in this Course
Load more
Download Computing in Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computing in Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computing in Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?