Introduction: Research design17.8711Statistics 1. a. In early use, that branch of political science dealing with the collection, classification, and discussion of facts (especially of a numerical kind) bearing on the condition of a state or community. In recent use, the department of study that has for its object the collection and arrangement of numerical facts or data, whether relating to human affairs or to natural phenomena. (OED) First usage: 17702Etymology of statistics From German Statistik, political science, from New Latin statisticus, of state affairs, from Italian statista, person skilled in statecraft, from stato, state, from Old Italian, from Latin status, position, form of government. -American Heritage Dictionary of the American Language3The Biggest Problem in Research: Establishing Causality Example: HIV and circumcision Observational studies suggest that male circumcision may provide protection against HIV infection4Why is causality such a problem? In observational studies, selection into “treatment” and “control” cases rarely random Schooling examples (private vs. public) Voting examples (pro-choice versus pro-life) Treatment and control cases may thus differ in other ways that affect the outcome of interest The two primary drivers of selection are Confounding variables Reverse causation5How to Establish Causality(i.e., how to rule out alternatives) How do we establish causality? By ruling out alternative explanations Legal analogy: prosecutor versus defense Run a field experiment! (best approach) HIV and circumcision: field experiment possible?6Post-test only experiment Donald Campbell and Julian Stanley, Experimental and Quasi-Experimental Designs for Research (1963) Summary:R X OR O No prior observation Classical scientific and agricultural experimentalism7Field Experiment example: HIV and male circumcision 3,274 uncircumcised men, aged 18–24, volunteered! Randomly assigned to a control or an intervention group with Follow-up visits at months 3, 12, and 21 Did it work? Control group: 2.1 per 100 person-years Treatment group: 0.85 per 100 person-years Problems? Internally valid! Because of randomization intervention, no bias from nonrandom selection into the treatment group. That is, No differences between the treatment and control group on confounding variables (only comparing apples with apples, no apples with oranges) No possibility of reverse causation Alternative interpretations of the treatment? External validity? Could the difference have occurred by chance? Unlikely: p < 0.001 on difference8HIV and male circumcision When controlling for behavioral factors, including sexual behavior that increased slightly in the intervention group, condom use, and health-seeking behavior, the protection was 61% (95% CI: 34%–77%). Male circumcision provides a degree of protection against acquiring HIV infection, equivalent to what a vaccine of high efficacy would have achieved. Male circumcision may provide an important way of reducing the spread of HIV infection in sub-Saharan Africa. PLoS Medicine Vol. 2, No. 11 9How to Establish Causality(i.e., how to rule out alternatives) But, running an experiment is often impossible Try anyway: e.g., HIV and circumcision If you can’t run an experiment: natural experiment Exploit something that is exogenous Accidental deaths Timing of Senate elections Imposition of new voting machines 9/11 terrorist attacks Geographical boundaries Exploit a discontinuity Summa Cum Laude’s effect on income Regression discontinuity (RD) design1011Regression discontinuity Example from Brazil1213How to Establish Causality(i.e., how to rule out alternatives) If you can’t run an experiment or find a natural experiment/discontinuity Control for confounding variables Difference-in-differences (DD) Matching Controlling for variables with parametric models, e.g., regression Eliminate reverse causation Exploit time with panel data, i.e., measure the outcome before and after some treatment15Difference-in-differences Media effects example Endorsement changes in the 1997 British election Illustrates difference-in-differences, which reduces bias from confounding variables Panel data, which can help rule out reverse causation16Figure 1: Newspaper Endorsements and Voting 1992-1997 % Labour vote among voters25 30 35 40 45 50 55 601992 1997Read paper before 1997 that switched to Labour (n=185)Did not read paper before 1997 that switched to Labour (n=1408)5.312.6 17Figure 1: Newspaper Endorsements and Voting 1992-1997 % Labour vote among voters25 30 35 40 45 50 55 601992 1997Read paper before 1997 that switched to Labour (n=185)Did not read paper before 1997 that switched to Labour (n=1408)5.312.6 18Figure 1: Newspaper Endorsements and Voting 1992-1997 % Labour vote among voters25 30 35 40 45 50 55 601992 1997Read paper before 1997 that switched to Labour (n=185)Did not read paper before 1997 that switched to Labour (n=1408)5.312.6 19How to Establish Causality(i.e., how to rule out alternatives) If you can’t run an experiment or find a natural experiment Control for confounding variables Difference-in-differences (DD) Matching Controlling for variables with parametric models, e.g., regression Eliminate reverse causation Exploit time with panel data, i.e., measure the outcome before and after some treatmentMuch of 17.871 is about this20Summary Classical experimentation unlikely, but always preferred Always keep a classical experiment in mind when designing observational studies Strive for “natural” or quasi-experiments Alternating years of standardized testing Timing of Senate elections Imposition of new voting machines 9/11 terrorist attacks Use Regression-discontinuity designs Geographical boundaries (e.g., minimum wage study) Use Difference-in-differences designs Gather as much cross-time data as possible (panel studies) If you only have cross-sectional data, be humble!2122(Angrist and Lavy,
View Full Document