Ch 4 Scientific inquiry A method or process of investigation in which the analyst develops a theory to produce empirically testable or falsifiable hypotheses and then tests these hypotheses w data Causal interference The effects of various inputs on policy outputs and outcomes Counterfactual Model of Causal Interference Counterfactual causal model of interference The classic experiment in which we are interested in testing for the causal effect of some treatment x on some outcome y Factual outcome what actually happened o Ex car accident Imagined conditions thought experiment o Ex if I did this how would it be different Counterfactual analysis what didn t happen Ex avoided car accident experimental condition second time Repetition effect Any change in a subject s performance due to repeating the o Ex Experiment involving doing action each time may be easier to do Ideally we want to run the experiment again w the subject having no memory of the first Factual outcome How the subject performed Counterfactual outcome How the subject would have performed Fundamental problem of causal interference The difficulty that we face in experimental situations where we cannot expose the same individual to both treatment and control at the same moment in time o So how are we able to assess casualty The classic experiment allows us to separate participants Treatment group The group we test Control group The placebo group o Then we compare the average scores of both groups o Still has issues Minimum Conditions for Drawing Valid Interferences There can be no differences between subjects in treatment group and control that are correlated w outcome or else selection effect o Ex Selecting chocolate lovers for experiment on ice cream flavors Confounding variables When we have a set of variables that are correlated with both our treatment and our potential outcome Spurious relationship When a relationship between 2 variables appears to exist due to the confounding presence of a third variable o Ex ice cream consumption violence o When we have a set of variables that are correlated with both our treatment and potential outcome our experiment cannot reveal a valid causal effect Minimum conditions for drawing valid causal inferences o To estimate a valid causal inference with the classic experiment we must ensure that all other variables that are correlated with both our treatment and our outcome are randomly distributed across the groups o We must assign individuals to treatment and control groups randomly Observational study Gathering data on something already occurring not the design of the experimenters o Treatment control groups not under analysts control o Good 4 studies that are unethical impossible etc o More difficult to draw valid causal inferences o Important to rule out confounding variables because out of analysts control o Ex Observing people effected by pollution Internal and External Validity 2 basic threats to the validity of a research design internal and external Internal validity the likelihood or level of confidence that a causal interference is drawn from an analysis reflects the true underlying causal relationship Threats may derive from biased samples measurement error and violations of the counterfactual causal interference model External validity Whether the findings from a study based upon a sample or an experiment can be extended or generalized to a larger population Threats may derive from o Non random sampling creates a non representative sample o Experimental design Experiment factors induce a specific response from participants Estimating Statistics of Interest Statistics of interest mean standard deviation Chi squared Pearson s correlation coefficient regression parameters etc Descriptive Statistics Reveal the location of the middle and shape of the distribution Most common Mean and Standard deviation o Use because they not only reveal where the middle of the distribution tends to be but also whether many of the cases tend to cluster around this middle or whether they re located further from it Standard deviation Average squared distance between each case and the mean o For any normal distribution 68 of all cases will fall between plus or minus one standard deviation around the mean 95 will fall between plus and minus two The smaller the standard deviation the tighter the distribution is around the mean More measurements are closer to the mean Measures of Association Correlation o Tells us whether and how strongly two variables tend to co vary o Positively or negatively correlated o Ranges from 1 perfect negative correlation to 1 perfect positive correlation o Correlation coefficient of 0 means two variables have no association o Does not prove causation o An analyst can determine the precise relationship between an input variable and an output variable of interest Regression Sources of Uncertainty Measurement Error o Conceptual measurement error Critics disagreeing with measurement Try to minimize this error by Careful attention to prior work and logical argumentation o Empirical measurement error Difficult to measure Try to minimize this error by Random Error Careful execution and instrumentation o Describes the stochastic or random probability that some events are simply not perfectly predictable Sampling Error o Refers to the inaccuracies that may occur when we attempt to use a sample to draw inferences about a larger population Inferential Statistics Using statistics to draw inferences about larger populations Population Entire set of individuals that an analyst is examining Sample A subset of the population Statistical Inference The process of using the characteristics of a randomly drawn representative sample to learn something about the characteristics of a larger population Characteristics of a population are referred to as parameters while the corresponding characteristics from a sample are referred to as statistics Estimating Uncertainty with Probability Our best guess is called an estimate If an event has a probability of one then we are perfectly confident that the event will occur At the midpoint p 5 an event has a 50 50 chance of occurrence 95 threshold If we repeated an experiment an infinite number of times the point of estimate of the causal effect from our experiment would fall within our confidence interval 95 of the time Hypothesis Testing Standard Hypotheses for Descriptive Statistics o Three most common We want to know whether or not the mean of a
View Full Document