UI STAT 5400 - Computing in Statistics - D2179732

Home> Schools> University of Iowa> Statistics (STAT) > STAT 5400> Computing in Statistics

DOC PREVIEW

UI STAT 5400 - Computing in Statistics

School name University of Iowa

Course Stat 5400- Computing in Statistics

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

22S:166 Computing in StatisticsSimulation studies in statisticsLecture 12October 2, 2006Based on a lecture by Marie Davidian forST 810A - Spring 2005Preparation for Statistical ResearchNorth Carolina State Universityhttp://www4.stat.ncsu.edu/ davidian/st810a/1Basics• simulation studies are commonly done to evaluate theperformance of a frequentist statistical procedure, or tocompare the performance of two or more differentprocedures for the same problem• enable us to see what happens “when many manysamples of the same size are drawn from the samepopulation”• properties of estimators that are often evaluated bysimulation– bias– mean squared error– coverage of confidence intervals• properties of hypothesis tests also can be evaluated bysimulation studies– size– power• simulation studies are experiments, and the things youknow about experimental design and sample sizecalculation apply2Terminology• simulation: a numerical technique for conductingexperiments on the computer• Monte Carlo simulation: a computer experimentinolving random sampling from probabilitydistributions– what statisticians usually mean by “simulations”3Rationale• Properties of statistical methods must be establishedbefore the methods can safely be used in practice.• But exact analytical derivations of properties are rarelypossible• Large sample approximations to properties are oftenpossible– evaluation of the relevance of the approximation to(finite) sample sizes likely to be encountered inpractice is needed• Analytical results may require assumptions such asnormality– What happens when these assumptions areviolated? Analytical results, even large sample ones,may not be possible4Questions to be addressed regarding anestimator or testing procedure• Is an estimator biased in finite samples? What is itssampling variance?• How does it compare to competing estimators on thebasis of bias, precision, etc.?• Does a procedure for constructing a confidence intervalfor a parameter achieve the claimed nominal level ofcoverage?• Does a hypothesis testing procedure attain the claimedlevel or size?• If so, what power is possible against differentalternatives to the null hypothesis? Do different testprocedures deliver different power?5Role of Monte Carlo simulation• Goal is to evaluate sampling distribution of anestimator under a particular set of conditions (samplesize, error distribution, etc.)• Analytic derivation of exact sampling distribution isnot feasible• Solution: Approximate the sampling distributionthrough simulation– Generate S independent data sets under theconditions of interest– Compute the numerical value of the estimator/teststatistic T (data) for each data set, yieldingT1, . . . , TS• If S is large enough, summary statistics acrossT1, . . . , TSshould be good approximations to the truesampling properties of the estimator/test statisticunder the conditions of interest6Simulation for properties of estimatorsSimple example: Compare three estimators for the mean µof a distribution based on i.i.d. draws Y1, . . . , Yn• Sample mean T(1)• Sample 20% trimmed mean T(2)• Sample median T(3)Remarks:• If the distribution of the data is symmetric, all threeestimators indeed estimate the mean• If the distribution is skewed, they do not7Simulation procedureFor a particular choice of µ, n, and true underlyingdistribution• Generate independent draws Y1, . . . , Ynfrom thedistribution• Compute T(1), T(2), T(3)• Repeat S times ⇒T(1)1, . . . , T(1)S; T(2)1, . . . , T(2)S; T(3)1, . . . , T(3)S• Compute for k = 1, 2, 3dmean = S−1SXs=1T(k)s=T(k),dbias = T(k)− µdSD =vuuuut(S − 1)−1SXs=1(T(k)s−T(k))2,dMSE = S−1SXs=1(T(k)s− µ)2≈dSD2+dbias28Relative efficiencyFor a particular choice of µ,Relative efficiency: For any estimators for whichE(T(1)) = E(T(2)) = µRE =var(T(1))var(T(2))is the relative efficiency of estimator 2 to estimator 1• When the estimators are not unbiased it is standard tocomputeRE =MSE(T(1))MSE(T(2))• In either case RE < 1 means estimator 1 is preferred(estimator 2 is inefficient relative to estimator 1 in thissense)9R code for example> set.seed(3)> S <- 1000> n <- 15> trimmean <- function(Y){mean(Y,0.2)}> mu <- 1> sigma <- sqrt(5/3)10Normal data:> out <- generate.normal(S,n,mu,sigma)> outsampmean <- apply(out$dat,1,mean)> outtrimmean <- apply(out$dat,1,trimmean)> outmedian <- apply(out$dat,1,median)> summary.sim <- data.frame(mean=outsampmean,trim=outtrimmean,+ median=outmedian)> results <- simsum(summary.sim,mu)> view(round(summary.sim,4),5)First 5 rowsmean trim median1 0.7539 0.7132 1.03892 0.6439 0.4580 0.37463 1.5553 1.6710 1.93954 0.5171 0.4827 0.41195 1.3603 1.4621 1.345211> resultsSample mean Trimmed mean Mediantrue value 1.000 1.000 1.000# sims 1000.000 1000.000 1000.000MC mean 0.985 0.987 0.992MC bias -0.015 -0.013 -0.008MC relative bias -0.015 -0.013 -0.008MC standard deviation 0.331 0.348 0.398MC MSE 0.110 0.121 0.158MC relative efficiency 1.000 0.905 0.69412Performance of estimates of uncertaintyHow well do estimated standard errors representthe true sampling variation?• E.g., For sample mean T(1)(Y1, . . . , Yn) =YSE(Y ) =s√n, s2= (n −1)−1nXj=1(Yj−Y )2• MC standard deviation approximates the truesampling variation• Compare average of estimated standarderrors to MC standard deviationFor sample mean: MC standard deviation0.331> outsampmean <- apply(out$dat,1,mean)> sampmean.ses <- sqrt(apply(out$dat,1,var)/n)> ave.sampmeanses <- mean(sampmean.ses)> round(ave.sampmeanses,3)[1] 0.32913Usual 100(1-α)% confidence interval forµ:Based on sample mean[Y − t1−α/2,n−1s√n,Y + t1−α/2,n−1s√n]• Does the interval achieve the nominal level ofcoverage 1 − α?• E.g. α = 0.05> t05 <- qt(0.975,n-1)> coverage <- sum((outsampmean-t05n*sampmean.ses <= mu) &(outsampmean+t05n*sampmean.ses >= mu))/S> coverage[1] 0.94914Simulations for properties of hypothesistestsSimple example: Size and power of the usualt-test for the meanH0: µ = µ0vs. H1: µ 6= µ0• To evaluate whether size/level of test achievesadvertised α generate data under µ = µ0andcalculate proportion of rejections of H0• Approximates the true probability ofrejecting H0when it is true• Proportion should ≈ α• To evaluate power, generate data under somealternative µ 6= µ0and calculate proportion ofrejections of H0• Approximates the true probability of rejectingH0when the alternative is true (power)• If

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

UI STAT 5400 - Computing in Statistics

Sign up for free to view:

Please select your school