Principles of Design Case Study Cigarettes and Smoking Previous chapters have dealt with methods of analyzing data to make inferences about a single population or to make comparisons between two populations Future chapters will focus on methodology for the analysis of data that arises in additional settings The present chapter looks instead at the problem of design the methods for collecting data How should data be collected so that analysis of the data leads to valid inferences What are the pitfalls of poor design choices and how can these affect inference What are the main statistical principles of design Statistics 371 Fall 2003 1 In a study pregnant women were questioned about their smoking habits diet and other variables The babies were followed up for some time There was strong statistical evidence that the mean birth weight of smokers babies was lower than the mean birth weight of nonsmokers babies Low birth weight is associated with a number of health problems in babies which makes the problem of finding causes of low birth weight important We say that smoking and low birth weight are associated with one another However this single study alone is insufficient to support the conclusion that smoking caused the lower birth weight The smokers and nonsmokers differed on a number of possible explanatory variables and it is unclear which of these variables may have caused the difference We say that the possible effects of smoking on birth weight are confounded with many other possible explanations Statistics 371 Fall 2003 3 Observation versus Experiment Statistical Principles of Design Bret Larget Department of Statistics University of Wisconsin Madison October 17 2003 We make a distinction between an experiment in which the researchers intervene in the experimental conditions and an observational study in which the researchers merely observe an existing situation The distinction is important in the interpretation of the results of an analysis Consider an analysis to compare two groups In an experiment the researchers assign the groups In an observational study the groups are simply observed Statistics 371 Fall 2003 Statistics 371 Fall 2003 2 Comparison Comments To attribute a causal relationship between an explanatory variable and a response variable such as smoking and low birth weight we would like to be able to make a comparison between two groups that differ only in the explanatory variable under study with all other possible explanatory variables the same between the two groups While in experimental settings it is possible for the researcher to create groups in which a single explanatory variable is the largest difference between two groups there are many settings for which experiments are either impossible two expensive or unethical It is not impossible to attribute a causal relationship based on observation studies alone but it is far more difficult because the researchers essentially need to identify and rule out or control for the effects of the other possible explanatory variables Statistics 371 Fall 2003 4 Case Study Cigarettes and Smoking Statistics 371 Fall 2003 6 More on smoking and low birth weight Confounding limits our justification concluding causal relationships Association is not causation Statistics 371 Fall 2003 The first study attempts to make comparisons of similar groups by a statistical analysis that attempts to adjust for the effects of other explanatory variables and so leave a comparison where the only important difference is the explanatory variable of interest Interpretation of causality from such a study however assumes that the statistical model for the joint effects of all the variables is an accurate description of reality often a dubious assumption The second study attempted to establish a link between smoking and low birth weight by establishing a link between smoking and the placenta where the link between the placenta and birth weight perfectly plausible without statistical justification This study even included an experiment in which the blood flow to the placenta could be compared within the same 3 In one study a large number of variables were measured A complex statistical method that simultaneously estimates the effects of several explanatory variables found that even after making adjustments for these other variables smoking still had an effect on birth weight A second study found differences in the placenta between smokers and nonsmokers and that some of the differences were associated with chemicals found in cigarettes This same study also found that having smokers not smoke for three hours caused a change in blood flow to the placenta A third study identified 159 women who smoked during a first pregnancy but not during a second pregnancy These women were matched with 159 other women who had smoked during both pregnancies and for whom other explanatory variables were similar This study found that the second babies of the women who quit smoking had heavier second babies than their matched controls who continued to smoke Statistics 371 Fall 2003 5 Example A Common Cold Randomization Researchers invited college students to volunteer in an experiment to test the effectiveness of a vaccine for preventing the common cold The volunteers were randomly assigned to two groups One group received the vaccine the other group took a placebo The study was blinded The subjects did not know what group they were in Both groups reported dramatic decreases in the number of colds Group Vaccine Placebo Statistics 371 Fall 2003 n 201 203 mean number of colds previous year current year 5 6 1 7 5 2 1 6 7 Comments Statistics 371 Fall 2003 9 Importance of Control Groups woman after she had smoked and when she had abstained from smoking The third study attempts to make a comparison between like groups by constructing groups that are as similar as possible on the basis of other explanatory variables that are thought to also have an effect Notice that there are several possible ways to address confounding with the objective to eventually establish a causal relationship through a series of observational studies Several different large observational studies are often necessary to present a convincing case for establishing causality Statistics 371 Fall 2003 In an experiment the researcher has the opportunity to assign treatment groups There are substantial advantages to randomization If the groups are randomly determined then any explanatory variables that might
View Full Document