Case Study Cigarettes and Smoking Statistics 371 Fall 2004 4 Comparison Principles of Design In one study a large number of variables were measured A complex statistical method that simultaneously estimates the effects of several explanatory variables found that even after making adjustments for these other variables smoking still had an effect on birth weight A second study found differences in the placenta between smokers and nonsmokers and that some of the differences were associated with chemicals found in cigarettes This same study also found that having smokers not smoke for three hours caused a change in blood flow to the placenta A third study identified 159 women who smoked during a first pregnancy but not during a second pregnancy These women were matched with 159 other women who had smoked during both pregnancies and for whom other explanatory variables were similar This study found that the second babies of the women who quit smoking were heavier than the second babies of their matched controls who continued to smoke Statistics 371 Fall 2004 6 The First Study Observation versus Experiment 1 We make a distinction between an experiment in which the researchers intervene in the experimental conditions and an observational study in which the researchers merely observe an existing situation The distinction is important in the interpretation of the results of an analysis Consider an analysis to compare two groups In an experiment the researchers assign the groups In an observational study the groups are simply observed Statistics 371 Fall 2004 2 Case Study Cigarettes and Smoking The first study attempts to make comparisons of similar groups by a statistical analysis that attempts to adjust for the effects of other explanatory variables and so leave a comparison where the only important difference is the explanatory variable of interest Interpretation of causality from such a study however assumes that the statistical model for the joint effects of all the variables is an accurate description of reality often a dubious assumption Statistics 371 Fall 2004 Statistics 371 Fall 2004 Bret Larget More on smoking and low birth weight Previous chapters have dealt with methods of analyzing data to make inferences about a single population or to make comparisons between two populations Future chapters will focus on methodology for the analysis of data that arises in additional settings The present chapter looks instead at the problem of design the methods for collecting data How should data be collected so that analysis of the data leads to valid inferences What are the pitfalls of poor design choices and how can these affect inference What are the main statistical principles of design Department of Statistics 5 University of Wisconsin Madison Statistics 371 Fall 2004 October 27 2004 To attribute a causal relationship between an explanatory variable and a response variable such as smoking and low birth weight we would like to be able to make a comparison between two groups that differ only in the explanatory variable under study with all other possible explanatory variables the same between the two groups While in experimental settings it is possible for the researcher to create groups in which a single explanatory variable is the largest difference between two groups there are many settings for which experiments are either impossible two expensive or unethical It is not impossible to attribute a causal relationship based on observation studies alone but it is far more difficult because the researchers essentially need to identify and rule out or control for the effects of the other possible explanatory variables Statistical Principles of Design Statistics 371 Fall 2003 However this single study alone is insufficient to support the conclusion that smoking caused the lower birth weight The smokers and nonsmokers differed on a number of possible explanatory variables and it is unclear which of these variables may have caused the difference We say that the possible effects of smoking on birth weight are confounded with many other possible explanations Confounding has the potential to mislead Association is not causation 7 In a study pregnant women were questioned about their smoking habits diet and other variables The babies were followed up for some time There was strong statistical evidence that the mean birth weight of smokers babies was lower than the mean birth weight of nonsmokers babies Low birth weight is associated with a number of health problems in babies which makes the problem of finding causes of low birth weight important We say that smoking and low birth weight are associated with one another Statistics 371 Fall 2004 3 Importance of Control Groups The Second Study This experiment shows the importance of a control group Without a control group it may have been thought that the vaccine was effective The real reason for the difference in both groups is most likely in the quality of measurement Previous year cold counts were based on the students memories Current year colds were measured by how often the subjects went to the health center Statistics 371 Fall 2004 12 Randomization 13 Blocking 14 Comparing blocking and randomization The third study attempts to make a comparison between like groups by constructing groups that are as similar as possible on the basis of other explanatory variables that are thought to also have an effect Statistics 371 Fall 2004 9 Notice that there are several possible ways to address confounding with the objective to eventually establish a causal relationship through a series of observational studies Several different large observational studies are often necessary to present a convincing case for establishing causality Statistics 371 Fall 2004 10 Example A Common Cold The advantage of randomization is that it helps to control for effects whether their sources are known or not The advantage to blocking is that it enforces balance for effects known or thought to be important Blocking is limited to only a few variables if you keep subdividing based on other variables it doesn t take long for each subject to be in a unique block It is almost always a good idea to randomize within blocks Blocking can be a better choice than complete randomization treating the entire sample as a single block in terms of yielding more precise estimates or having more powerful tests This improvement in statistical power depends on the variable being blocked having an
View Full Document