1 CORRELATION AND REGRESSION WITH MULTIPLE VARIABLES 2 Bivariate Associations Bivariate correlation and regression examine the association between two variables X Y Bivariate associations may provide a deceptive view of the actual association between X Y X Y s associations with other variables are not removed from the bivariate association between X and Y 3 The Deceptive Lens of Bivariate Assc Bivariate Associations b w Salary PhD Publications Sex and Citations Variable Salary PhD Pubs Sex Citations Salary PhD 62 Pubs 46 68 Sex 26 15 05 Citations 51 46 30 01 Note Sex is coded such that 0 male and 1 female Bivar r suggest that salary increases with PhD Pubs Citations is less for females than males However PhD Pubs and Citations are correlated Some of the assc Between salary and Pubs may really be due to Citations and PhD and vice versa 4 Goal of MRC Focusing the Lens To accurately assess the X and Y assc we need to remove the influence of ALL variables associated with X and Y When X Y and Z are analyzed in a MRC the effect of Z is removed from the association of X Y MRC can might also be deceptive if there are other variables associated with X and Y in addition to Z that are not included in the analysis Note that the problem of shared association with unmeasured variables is a problem of the research method 5 IVs from a pure experiment are uncorrelated with other factors Today s Lecture Multiple regression characteristics of the overall model characteristics of the regression parameters Multiple correlation semi partial correlations partial correlations Analytic strategies of MRC simultaneous hierarchical step wise 6 Multiple Regression Provides a linear model linking Y to multiple predictors unconfounds the effect of each predictor on Y from the effect of the other predictors Y AY 12 BY 1 2 X 1 BY 2 1 X 2 Betas are partial betas B is the effect of X1 on Y controlling for X2 B is the effect of X2 on Y controlling for X1 Y 1 2 Y 2 1 Y intercept A is the predicted value of Y when X1 and X2 are zero point where regression line crosses the Y axis Y 12 Simplified notation of multiple regression equation 7 Y A B1 X 1 B2 X 2 Academic Salary Example Bivariate regressions Salary 21106 566Pubs Salary 22976 1918Citations Pubs citations r 30 Multiple regression Salary 20285 418Pubs 1536Citations note partial B does not have to be smaller than bivariate B e g rCP rYC rYP 8 Least Square Formulas BY 1 2 rY 1 rY 2 r12 Y 1 r122 X 1 BY 2 1 rY 2 rY 1r12 Y 1 r122 X 2 AY 12 Y BY 1 2 X 1 BY 2 1 X 2 Formulas are for a 2 predictor model Become more complicated as more predictors are added Multiple Regression in SAS 9 proc reg model salary pubs citations run See output for the partialled Bs Salary 20285 418Pubs 1536Citations Characteristics of the Overall Model 10 Sum of squares Coefficient of multiple determination R2 F test of the overall model Adjusted R2 Standard error of the estimate SEestimate Sum of Squares Top of output has ANOVA table for the Model 11 Source DF Sum of Squares Mean Square Model Error Total 2 12 14 202061502 355458823 557520325 101030751 29621569 F 3 41 p 0 0672 SStotal SSmodel SSerror SStotal total variability in Salary SSmodel amount of variability in Salary related to Pubs Citations SSerror amount of variability in Salary NOT related to Pubs Citation error Y Y 2 2 Coefficient Of Multiple Determination R Proportion of variability in Y explained by the model R2 SSmodel SStotal 12 avc Salary example salary pubs citations R2 SSmodel 202061502 3602 SStotal 557520325 R square on SAS output Pubs and citations together account for 36 of variability in Salary Multiple R2 is larger than either bivariate r2 2 2 rSP 21 rSC 26 unless 1 predictor is unrelated to DV Bivariate r2 do not sum to R2 21 26 36 partialled from each r2 shared variability among the predictors F Test Of The Model F 2 12 3 41 p 0672 in ANOVA is a test of the model Comparison of 2 models 13 Salary intercept Pubs Citations full model Salary intercept restricted model does model with pubs and citations fit better i e less error than a model without pubs and citations ER E F F EF df R df F df F ER and EF the SSerror of the full and restricted models dfR and dfF represent the degrees of freedom error of the full and restricted models Run Both Models in SAS proc reg model salary pubs citations model salary run 14 ER EF F EF df R df F 557520325 355458823 df F 355458823 12 14 12 3 41 F 2 12 3 41 p 0672 indicates that the full model does not fit better than the restricted model but n 15 F test model comparison in Terms of R 2 RFull RRe2 stricted F 2 Full 1 R df R df F df F 2 3624 0 14 12 3 41 1 3624 12 By changing the restricted relative to full model we can test other hypotheses 15 Does a model with publications and citations fit better than a model with only publications Adjusted R 2 Shrunken R2 or omega hat squared or R adjusts for the positive bias of R2 2 2 The sample R2 is an estimate of the degree of shared variability in the population 2 But it tends to overestimate due to sampling error When there are no associations in the 16 population sampling error may produce associations in the sample Salary example R is Adj r sqr in output 2 R2 36 and R2 26 Standard Error Of The Estimate SEestimate average amount of error when predicting Y from the multiple predictor variables Y Y where k number of predictor variables 2 SE estimate n k 1 Listed as Root MSE in SAS i e MSE 17 Salary example SEestimate 5443 When predicting Salary from publications and citations we ll be in error on average by 5443 Characteristics of the Individual Regression Parameters Salary 20285 418Pubs 1536Citations Keep in mind that the B s are sample estimates Standard Error of the Partial Betas SEB Significance test of the Partial Betas 18 Standardized Regression Parameters SE Standard Error of the Partial Betas SEB estimates the variability in B across samples i e in sampling distribution of B and estimates how stable the obtained estimate is SEBi Y i 1 RY2 1 n k 1 1 Ri2 k is the number of predictor variables is the standard deviation of the predictor variable i e Xi that corresponding to the given beta i e Bi 1 R is the proportion of Y not explained by the model and R is the multiple correlation of the remaining predictor variables when they are used to predict the X variable in question i 2 Y 2 i SEB increases to the extent that the model does not predict Y i …
View Full Document