Simulation Studies Spring 2011 Outline 1 Introduction 2 Confidence Intervals 3 Hypothesis Tests Outline 1 Introduction 2 Confidence Intervals 3 Hypothesis Tests Latin Terms for Some Different Types of Biological Experiments 1 in vivo within the living e g animal testing clinical trials experimentation using a whole living organism 2 in vitro within the glass e g test tube or petri dish experimentation using components of an organism that have been isolated and studied in a controlled biological environment 3 in silico in the computer fake Latin playing on silicon experimentation performed on computer or via computer simulation See Wikipedia Simulation Experiment in silico 1 Almost free compared to very expensive human or animal testing 2 Complete control the biologist and modeler set exactly the parameters of the system 3 A good first approximation for studying a biological system 4 Limited by computer precision and human knowledge how realistically can you actually model a biological system 5 NOT meant to replace in vivo or in vitro experiments Instead use computer simulation to complement these experiments Let s Focus 1 The above description of in silico experiments is meant as background 2 The topic of modeling and simulating complex biological systems could be studied in a year long course sequence 3 We will focus on a specific type of simulation study commonly used in statistics Monte Carlo experiments 1 2 4 rely on repeated random sampling Named after the Monte Carlo casino please do NOT be fooled into thinking you can beat the house the house always wins Usually when a statistician says Simulation Study he is referring to a Monte Carlo experiment we will use this meaning of simulation throughout Simulation Experiments in a typical statistics study 1 Confidence intervals and hypothesis testing have a repeated sampling interpretation 2 We do not actually want to collect 1000 different random samples from our target population remember we hope that a 95 CI would contain the true mean about 950 out of the 1000 times 3 Also we will never know the true mean in a real experimental situation But you know it in a computer experiment because you set the mean 4 You use the computer to generate pseudo random samples Game Plan We will explore simulation studies for one sample problems in the context of both 1 Confidence intervals 2 Hypothesis tests Note future work We are focusing on two procedures that we know confidence intervals hypothesis tests The real power of simulation studies comes in exploring the performance of a statistical procedure e g confidence interval in complex settings where there the distributional theory is unknown e g we cannot say the procedure is based on normality This is important to remember for those of you who continue with quantitative research e g anyone doing a research based masters or PhD most scientific fields are quantitative Outline 1 Introduction 2 Confidence Intervals 3 Hypothesis Tests Simulation Studies We will explore simulation studies in the context of 1 Confidence intervals 2 Hypothesis tests Example 1 Let s simulate one data set using R 2 We draw a pseudo random sample of size n 10 from the N 0 4 population rnorm And then use R to compute a confidence interval t test 3 t test spits out a lot of information but in particular it spits out the 95 confidence interval for the mean Example Continued x rnorm n 10 mean 0 sd 4 x 3 7032395 1 3803970 1 9454320 3 0871619 2 1036865 5 8024249 0 2351963 1 5853121 8 4006722 4 7959016 t test x 95 percent confidence interval 1 509859 4 227096 Extend the Example to a Simulation Experiment Repeat the above procedure 1000 times And check how many times the confidence interval contains the true mean We know the true mean is 0 because we can control everything in a computer experiment here we are drawing samples from a N 0 4 population 0 and 4 Continuous data t intervals N 1000 number of simulation count 0 counting the number of CI contain 0 for i in 1 N n 10 x rnorm n mean 0 sd 4 x bar mean x s sd x l x bar qt 0 975 n 1 s sqrt n u x bar qt 0 975 n 1 s sqrt n if l 0 u 0 count count 1 count N 1 0 952 Continuous Data z intervals If we use critical value of 1 96 from N 0 1 instead of t distribution we will get a worse result N 1000 number of simulation count 0 counting the number of CI contain 0 for i in 1 N n 10 x rnorm n mean 0 sd 4 x bar mean x s sd x l x bar 1 96 s sqrt n u x bar 1 96 s sqrt n if l 0 u 0 count count 1 count N 1 0 915 Continuous Data Uniform distribution Instead of use normal distribution we use the U 10 10 to generate data N 1000 number of simulation count 0 counting the number of CI contain 0 count z 0 counting z using 1 96 for i in 1 N n 10 x runif n min 10 max 10 x bar mean x s sd x l x bar qt 0 975 n 1 s sqrt n u x bar qt 0 975 n 1 s sqrt n l z x bar 1 96 s sqrt n u z x bar 1 96 s sqrt n if l 0 u 0 count count 1 if l z 0 u z 0 count z count z 1 count N 1 0 936 count z N 1 0 903 Uniform distribution increase n When number of observation n increases the central limit theorem works better N 1000 number of simulation count 0 counting the number of CI contain 0 count z 0 counting z using 1 96 for i in 1 N n 30 x runif n min 10 max 10 x bar mean x s sd x l x bar qt 0 975 n 1 s sqrt n u x bar qt 0 975 n 1 s sqrt n l z x bar 1 96 s sqrt n u z x bar 1 96 s sqrt n if l 0 u 0 count count 1 if l z 0 u z 0 count z count z 1 count N 1 0 949 count z N Discrete Data Binomial distribution Consider the confidence interval of the population proportion p Assume x B 50 p where we choose p 0 01 0 5 0 99 x We compare the covering probability by p x 2 and p n 4 n p cover prob p cover prob p 0 01 0 99 0 4 0 5 0 928 0 928 0 99 0 983 0 397 R code N 1000 number of simulation for p in c 0 01 0 5 0 99 count 0 counting the number of CI contain 0 count hat 0 for i in 1 N n 50 x rbinom 1 size n p p hat x n p tilde …
View Full Document