Predicting from Correlations Review 1 Correlations relations between variables May or may not be causal Enable prediction of value of one variable from value of another To test correlational and causal claims need to make predictions that are testable Operationally define terms Construct validity validity do the operational characterization capture what is intended Review 2 Use scatterplots to diagram correlations Negative correlation Positive correlation Person co efficient measures strength of correlation 1 0 0 1 0 Perfect negative No Correlation Perfect Positive 1 Correlation Coefficients Height and weight are positively correlated In this graph Pearson r 67 240 220 W EIG HT 200 180 160 140 120 SEX 100 male 80 4 5 female 5 0 5 5 6 0 6 5 7 0 HEIGHT Contains two subgroups men and women May exhibit different correlations For females red only r 47 For males blue only r 68 How much does the correlation account for Correlations are typically not perfect r 1 or r r 1 Evaluate the correlation in terms of how much of the variance in one variable is accounted for by the variance in another Amount of variance accounted for on the variable whose value is being predicted equals Variance explained total variance This turns out to be the square of the Pearson coefficient r2 So if r 80 then we can say that 64 of the variance is explained If r 30 then we can say that 9 of the variance is explained Variance Accounted for r2 56 r2 30 2 Variance accounted for 2 Height only partially accounts for weight For females r 47 so r2 22 For males r 68 so r2 46 240 220 W EIG HT 200 180 160 140 120 SEX 100 male 80 4 5 female 5 0 5 5 6 0 6 5 7 0 HEIGHT Prediction A major reason to be interested in correlation If two variables are correlated we can use the value of an item on one variable to predict the value on another Prediction of future job performance based on years of experience Actuarial prediction how long one will live based on how often one skydives Risk assessment prediction of how much risk an activity poses in terms of its values on other variables Prediction employs the regression line Regression line Criterion variable Start with scatter plot of data points Predictor variable Find line which allows for the best prediction of the criterion variable one to be predicted from that of the predictor variable which minimizes the square of the distances of the blue lines 3 Regression line y a bx y predicted or criterion variable x predictor variable a yy intercept intercept regression constant b slope slope regression coefficient Note the regression coefficient is not the same as the Pearson coefficient r Understanding the Regression Line Assume the regression line equation between the variables mpg y and weight x of several car models is mpg 62 85 0 011 weight MPG is expected to decrease by 1 1 mpg for every additional 100 lb in car weight Interpolating from the regression line Correlation between Identical Blocks Test a measure of spatial ability Wonderlic Test a measure of general intelligence Calculate new value for x 10 y 48 x 10 15 86 20 67 4 Interpolating from the regression line visually Draw line from the x axis to the regression line Draw line from the intersection with the regression line to the y axis Sleep study Correlations in samples and populations The interest in correlations typically goes beyond the sample studied studied investigators want to know about the broader population Two approaches Estimating correlation in population from correlation in sample r Confidence interval Determining whether there is a correlation in a given direction in the real population from correlation in sample Statistical significance 5 Statistical significance and pp values Fundamental question Is the result due to chance or to a real correlation in the population How likely is a given correlation in the sample if there were no correlation or a correlation in the other direction in the population This is specified by the p value A p value of 05 means there is 1 chance in 20 of a correlation in the sample without a correlation in the real population That is 19 times out of 20 the correlation in the sample is due to a correlation in the population Statistical significance and pp values p values typically reported as less than some value 05 05 is the most commonly used significance level 01 01 is a higher more demanding significance level 1 chance in 100 of getting the result by chance For some purposes lower p values are useful to know Prediction with reliably of only 10 or 25 could be important to know Chemical exposure and cancer etc Significance vs importance A statistically significant finding may or may not be important All statistical significance means is that the finding is statistically reliable reliable not likely to have occurred by chance Whether it is important important worth knowing knowing depends on the finding 6 Correlations are hard to detect Humans are terrible at recognizing intuitively whether two variables are correlated We see correlations where none exist We fail to see correlations that do exist Must actually look at the evidence not rely on our impressions Perform statistical analyses Seeing correlations that don t exist When I m waiting for the bus the one going in the other direction always comes first Are men or women more likely to have a sister Evelyn Marie Adams won the New Jersey lottery twice a 1 in 17 trillion likelihood likelihood seem unlikely Given the millions of people who buy state lottery tickets it was practically a sure thing that someone someday somewhere would win twice Coincidences happen Loarraine and Levinia Christmas are twins They set out to deliver Christmas presents to each other near Flitcham England Their cars collide Philip Dodgson a clinical psychologist at South Downs heath center in Sussex England does psychotherapy with clergy and members of religious orders He surfs the web to see if there are is anyone else named Philip Dodgson He finds one in Ontario and writes to him The second Philip Dodgson is also a clinical psychologists working at Southdown Center a residential psychotherapy center for clergy and members of religious orders 7 Coincidences happen Adams Jefferson and Monroe three of the first five presidents of the US died on the same date date July 4 Charles Schulz died of a heart attack on the day his last published Peanuts cartoon Limits to Regression analysis Regression to the mean Last month you took the SAT GRE and scored 750 out of
View Full Document
Unlocking...