1 BIVARIATE CORRELATION AND REGRESSION 2 Multiple Correlation Regression Reveals associations b w a DV and multiple continuous categorical IVs Does not indicate whether associations are causal Inferences of causality are limited by method in which data were gathered In particular limited to the extent to which method satisfies criteria of an Experimental Design Manipulate the IV s Randomly assign participants to levels of the IV s Methodologically control extraneous variables 3 Organization of Lecture Quick review of standard deviation and Z scores Bivariate Correlation Regression 4 Standard Deviation Average amount of variability in a variable Two formulas one for population and sample s X N 2 SS X N s X X 2 n 1 SS X n 1 denominator of sample formula adjusts for tendency of samples to underestimate variability in a population Cohen Cohen use different notation sd and sd s lower case x X X e g x X X Dividing Y X 5 Many formulas involve Y X which is SSY SSX when n is equal SS y SS x SS y SS x Ny Nx ny 1 nx 1 SS y Ny SS y Nx SS x SS x SS y ny 1 SS y nx 1 SS x SS x Z scores z X 6 Provides a common metric to compare variables measured on different scales measured in standard deviations from the mean If entire distribution is transformed into z scores z distribution will have a mean 0 standard deviation 1 shape of the transformed distribution will have the same shape as the original distribution A DATA SET P 99 of Cohen Cohen 1983 Subject 1 2 Salary 18000 19961 PhD 1 2 Pubs 2 4 Sex 0 0 Citations 1 0 7 3 4 5 6 7 8 9 10 11 12 13 14 15 19828 17030 19925 19041 27132 27268 32483 27029 25362 28463 32931 28270 38362 5 7 10 4 3 8 4 16 15 19 8 14 28 5 12 5 9 3 1 8 12 9 4 8 11 21 1 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 4 0 3 5 0 3 PhD years since PhD Pubs of publications Sex 0 male 1 female Citations of times pubs were cited Cohen Cohen 1983 Applied multiple regression correlation analysis for the behavioral sciences p 99 Hillsdale NJ GRAPHIC REPRESENTATION OF BIVARIATE RELATIONS 8 Can visualize a bivariate relations with a scatter plot Plot paired values of X Y variables Plot of salary pubs Legend A 1 obs B 2 obs etc E g plot values of salary and publications salary 40000 A 35000 A A 30000 A A A A A A 25000 20000 A B A A A 15000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Obtaining a Scatter Plot in SAS proc plot plot salary pubs run Positive Linear Association pubs 9 Plot of salary rpubs Legend A 1 obs B 2 obs etc salary 40000 A 35000 A A 30000 A A A A A A 25000 20000 B A A A A 15000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Negative Linear Association reverse scored publications rpubs 10 Plot of y x Legend A 1 obs B 2 obs etc y 10 A A A A A A 0 0 10 Non Linear Association x 11 QUANTIFYING LINEAR ASSOCIATIONS PEARSON CORREALTION COEFFICIENT 12 Visualization deceiving and no metric for expressing strength of the relationship Pearson correlation coefficient r ranges from 1 to 1 Valence of r indicates direction of relationship indicates negative association indicates positive association Magnitude of r indicates strength of the relationship 0 no linear association 1 perfect linear association Salary and Publications 13 r 46 So positive and moderately strong association between salary and publications in our sample Equations for r 14 Multiple equations In terms of Z scores r Z r X ZY if Z is calculated using the population standard deviation n Z X ZY n 1 if Z is calculated using the sample standard deviation s In terms of raw scores r X X Y Y X Y n X X Y Y s X sY n 1 X X Y Y X X Y Y 2 Logic of the formula r X X Y Y X Y n 2 15 Numerator is covariance and assesses extent to which X and Y deviate in the same direction from their respective means Denominator adjusts for the units of measurement on which X and Y are assessed Covariance increases to the extent to which X and Y are measured in terms of big numbers Significance Testing of r Interested in whether there is a correlation in the population population correlation is rho H o 0 H1 0 16 Many formulas t r n 2 1 r2 compare obtained t with critical t 05 df n 2 Significance Testing of r Salary Pubs critical t 05 df 13 2 160 t 46 15 2 1 462 1 868 fail to reject H0 17 However sig test of r is highly dependent on sample size If n 17 we would have declared r significant With relatively large n even the smallest r is significant Computing and Testing a Bivariate Correlation in SAS proc corr var salary pubs run 18 examine the output Bivariate Regression Associations can also be examined with regression In regression one of the variables is assumed to be caused by the other variable s 19 Regression provides a linear model that relates the dependent or criterion variable to the independent or predictor variable Y A B1 X predicted value of Y A Y intercept or predicted value of Y when X 0 B regression parameter slope indicates amount of change in Y per unit change in X Salary Example Regress Salary on Publications Salary 21106 566pubs 20 Salary increases 566 per publication With no publications pub 0 we expect salary 21106 Can use the equation to predict E g What is the predicted salary for a person with 2 pubs Salary 21106 566 2 22 238 Plotting the Regression Line 21 Can plot the regression line by plotting predicted values for 2 values of pub and connecting the dots note are actual values of salary 22 Ordinary Least Squares Regression OLS This semester we will focus on OLS OLS provides formulas for estimating B and Y intercept that minimize the squared error of prediction There are other forms of regression that provide alternative formulas for calculating B and Y intercept Weighted Least Squares WLS Maximum Likelihood ML 23 Least Squares Formulas Many equivalent OLS formulas for B B X X Y Y r X X Y 2 X r sY r sX Y intercept A Y B X Y Y X X 2 2 24 Regression Parameter and Y intercept in SAS Can use proc glm or proc reg Proc reg has additional feature designed for regression proc reg model salary pubs run Examine the output focus on …
View Full Document