Lecture 11 MulticollinearityMulticollinearity IntroductionEasy answers?MulticollinearityNo Multicollinearity Example: Mouse experimentLinear modelingDose of Drug A on TumorDose of Drug B on TumorDiet on TumorAll in the model togetherCorrelation matrix of predictors and outcomeResultThe other extreme: perfect collinearityThe model has infinitely many solutionsEffects of MulticollinearitySlide 16Implications in inferenceSlide 18Implications in InferenceSo, which is the ‘correct’ variable?ExampleSENICSlide 23Let’s try an example with serious multicollinearitySlide 25Lecture 11MulticollinearityBMTRY 701Biostatistical Methods IIMulticollinearity IntroductionSome common questions we ask in MLR•what is the relative importance of the effects of the different covariates?•what is the magnitude of effect of a given covariate on the response?•can any covariate be dropped from the model because it has little effect or no effect on the outcome?•should any covariates not yet included in the model be considered for possible inclusion?Easy answers?If the candidate covariates are uncorrelated with one another: yes, these are simple questionsIf the candidate covariates are correlated with one another: no, these are not easy.Most commonly:•observational studies have correlated covariates•we need to adjust for these when assessing relationships•“adjusting” for confoundersExperimental designs?•less problematic•patients are randomized in common designs•no confounding exists because factors are ‘balanced’ across armsMulticollinearityAlso called “intercorrelation”refers to the situation when the covariates are related to each other and to the outcome of interestlike confounding, but a statistical terminology for it because of the effects it has on regression modelingNo Multicollinearity Example: Mouse experimentMouse Dose A Dose B Diet Tumor size1 100 25 0 452 200 25 0 563 300 25 0 254 100 50 0 155 200 50 0 176 300 50 0 107 100 25 1 308 200 25 1 289 300 25 1 2010 100 50 1 1011 200 50 1 512 300 50 1 3Linear modelingInterested in seeing which factors influence tumor size in miceNotice that the experiment is perfectly balanced.What does that mean?Dose of Drug A on Tumor> reg.a <- lm(Tumor.size ~ Dose.A, data=data)> summary(reg.a)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 32.50000 12.29041 2.644 0.0246 *Dose.A -0.05250 0.05689 -0.923 0.3779 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 16.09 on 10 degrees of freedomMultiple R-Squared: 0.07847, Adjusted R-squared: -0.01368 F-statistic: 0.8515 on 1 and 10 DF, p-value: 0.3779 >Dose of Drug B on Tumor> reg.b <- lm(Tumor.size ~ Dose.B, data=data)> summary(reg.b)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.0000 9.4956 6.108 0.000114 ***Dose.B -0.9600 0.2402 -3.996 0.002533 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.4 on 10 degrees of freedomMultiple R-Squared: 0.6149, Adjusted R-squared: 0.5764 F-statistic: 15.97 on 1 and 10 DF, p-value: 0.002533 >Diet on Tumor> reg.diet <- lm(Tumor.size ~ Diet, data=data)> summary(reg.diet)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 28.000 6.296 4.448 0.00124 **Diet -12.000 8.903 -1.348 0.20745 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 15.42 on 10 degrees of freedomMultiple R-Squared: 0.1537, Adjusted R-squared: 0.06911 F-statistic: 1.817 on 1 and 10 DF, p-value: 0.2075All in the model together> reg.all <- lm(Tumor.size ~ Dose.A + Dose.B + Diet, data=data)> summary(reg.all)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 74.50000 8.72108 8.543 2.71e-05 ***Dose.A -0.05250 0.02591 -2.027 0.077264 . Dose.B -0.96000 0.16921 -5.673 0.000469 ***Diet -12.00000 4.23035 -2.837 0.021925 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7.327 on 8 degrees of freedomMultiple R-Squared: 0.8472, Adjusted R-squared: 0.7898 F-statistic: 14.78 on 3 and 8 DF, p-value: 0.001258Correlation matrix of predictors and outcome> cor(data[,-1]) Dose.A Dose.B Diet Tumor.sizeDose.A 1.0000000 0.0000000 0.0000000 -0.2801245Dose.B 0.0000000 1.0000000 0.0000000 -0.7841853Diet 0.0000000 0.0000000 1.0000000 -0.3920927Tumor.size -0.2801245 -0.7841853 -0.3920927 1.0000000>ResultFor perfectly balanced designs, adjusting does not affect the coefficientsHowever, it can affect the significanceWhy?•residual sum of squares is affected•if you explain more of the variance in the outcome, less is left to chance/error•when you adjust for another related factor, you will likely improve the significanceThe other extreme: perfect collinearityMouse Dose A Dose C Diet Tumor size1 100 100 0 452 200 300 0 563 300 500 0 254 100 100 0 155 200 300 0 176 300 500 0 107 100 100 1 308 200 300 1 289 300 500 1 2010 100 100 1 1011 200 300 1 512 300 500 1 3The model has infinitely many solutionsToo much flexibilityWhat happens?The fitting algorithm usually gives you some indication of this•will not fit the model and gives an error•drops one of the predictors“perfectly collinear” = “perfect confounding”Effects of MulticollinearityMost common result•two covariates are independently associated with Y in simple linear regression models•in MLR model with both covariates, one or both is insignificant•the magnitude of the regression coefficients is attenuated•why?recall the adjusted variable plotif the two are related, removing the systematic part of one from Y may leave too little left to explainEffects of MulticollinearityOther situations•Neither is significant alone, but they are both significant together (somewhat rare)•Both are significant alone and both retain signficance in the model•The regression coefficient for one of the covariates may change direction•Magnitude of coefficient may increase (in absolute value)It is usually hard to predict exactly what will happen when both are in the modelImplications in inferencethe interpretation of a regression coefficient measuring the change in the
View Full Document