1 2 Experiments and observational studies 22S 30 105 Statistical Methods and Computing In an experiment the investigator studies the effect of varying some factor that he she controls Introduction to Types of Studies Lecture 7 February 15 2008 In an observational study the investigator merely observes and records information on the subjects but does not manipulate any factors Kate Cowles 374 SH 335 0727 kcowles stat uiowa edu It is very difficult to establish causation between one variable and another especially difficult based on observational studies 3 Koch s postulates In 1890 the German microbiologist Robert Koch attempted to develop criteria for establishing whether a particular microorganism causes a particular disease not considered completely satisfactory today first the organism is always found with the disease in accord with the lesions and clinical stage observed second the organism is not found with any other disease third the organism isolated from one who has the disease and cultured through several generations reproduces the disease in a susceptible experimental animal Even where an infectious disease cannot be transmitted to animals the regular and exclusive presence of the organism proves a causal relationship 4 More formal criteria for judging whether an observed association is causal strength of the association dose response relationship consistency of the association Is the association observed in one study observed in other study populations in studies using different methods etc temporally correct association specificity of the association the alleged effect is rarely if ever observed without the alleged cause plausibility 5 Example Female literacy and infant mortality 6 Confounding Two variables explanatory or lurking are confounded when their effects on a response variable cannot be separated Association does not by itself imply causation 7 Populations and samples A population is the entire set of items about which we might wish to draw conclusions Example I wish to find out the average income of families of current UI undergrads Example A political pollster would like to know the Presidential preference of every registered voter in South Carolina Some populations we would like to study are hypothetical Example all pregnant women who are infected with the HIV virus now and in the future A sample is the subset of the population that we can actually study on which we can measure values of variables 8 How a sample is drawn from a population affects how valid it is to apply conclusions based on the sample to the population The sample design is the method used to choose the sample from the population 9 Bias 10 Kinds of sample designs The results of a study are biased if they are subject to systematic error simple random sample SRS a sample of size n individuals chosen in such a way that every set of n indivduals in the population has an equal chance to be the sample the ideal biased or unbiased i e there is something about the way the study is carried out such that if we did many studies in this way on average we d get the wrong conclusions One source of bias is if the sample is not representative of the entire population voluntary response sample The design of a study is biased if it systematically favors certain outcomes consists of people who choose themselves by responding to a general appeal biased or unbiased convenience sample consists of subjects who are easy to get biased or unbiased 11 judgment sample consists of subjects chosen by an expert to be representative of the population biased or unbiased 12 How simple random samples are drawn each member of the population is uniquely identified in some way example the population of interest is UI students each has a unique ID number intuitive idea the identifiers are put in a hat and drawn at random usually actually done by a computer can be done manually using a table of random digits first assign a unique numeric label to each member of the population use table of digits to select labels at random 13 Example I wish to get an idea as to how well undergrad students in 22S 30 like the textbook To do this I want to administer a lengthy interview and I have time to do only 3 Therefore I want to draw a simple random sample of size 3 from the population of 24 undergrad students in the class 15 17 Derek N 18 Tuyet 19 Ben 20 Mitchell 21 Nicole 22 Cristina 23 Joanna 24 Jessica Use Table B in your book to find the first 3 of these identifiers that appear 14 Begin by giving each student a unique numeric identifier 1 Derek A 2 Kara 3 Courtney 4 Karen 5 Cory 6 Catherine 7 Katie H 8 Ryan 9 Jenna 10 Peter 11 Anne 12 Todd 13 Anthony 14 Katie McE 15 Kimbra 16 Phil 16 Table of random digits Each entry in the table is equally likely to be any of the 10 digits from 0 to 9 inclusive The entries are independent of each other i e knowledge of what digits are in one part of the table gives no information about the digits in any other part 17 Using SAS to draw a simple random sample options linesize 79 data students input name 9 datalines Derek A Kara Courtney Karen Cory Catherine Katie H Ryan Jenna Peter Anne Todd Anthony Katie McE Kimbra Phil Derek N 18 Tuyet Ben Mitchell Nicole Cristina Joanna Jessica proc print run 19 Output Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Name Derek A Kara Courtney Karen Cory Catherine Katie H Ryan Jenna Peter Anne Todd Anthony Katie McE Kimbra Phil Derek N Tuyet Ben Mitchell Nicole data students 20 22 23 24 Cristina Joanna Jessica 21 22 Proc plan Using the same seed will reproduce exactly the same random choice proc plan seed 72950 factors a 3 of 24 run proc plan seed 72950 factors a 3 of 24 run The PLAN Procedure The PLAN Procedure Factor a Select Levels Order 3 24 Random Factor Select Levels Order 3 24 Random a a a 1 24 1 24 7 23 Using a different seed will produce a different set of choices 7 24 Drawing from a larger population proc plan seed 241 factors a 100 of 1000 run proc plan seed 32542 factors a 3 of 24 run Procedure PLAN Procedure PLAN Factor a Factor a Select Levels Order 3 24 Random a 2 16 4 Select 100 Levels 1000 Order Random a 576 630 550 119 901 497 864 792 705 120 944 767 773 481 359 286 441 692 507 687 362 517 412 921 265 819 449 584 110 597 139 432 844 41 28 598 868 644 470 518 424 479 859 488 269 311 264 24 594 144 621 861 585 822 326 235 9 240 775 69 897 863 337 52 674 529 329 271 178 175 462 651 168 143 820 752 262 923 939 562 239 423 673 298 50 974 435 233
View Full Document