Detecting and reducing multicollinearityDetecting multicollinearityCommon methods of detectionThe first variance at issueThe second variance at issueThe ratio of the two variancesVariance inflation factorsVariance inflation factors (VIFk)Blood pressure exampleSlide 10Regress y = BP on all 6 predictorsRegress x2 = weight on 5 predictorsThe variance inflation factor calculated by its definitionThe pairwise correlationsRegress y = BP on age, weight, duration and stressReducing data-based multicollinearityData-based multicollinearitySome methods(Modified!) Allen Cognitive Level (ACL) StudyAllen Cognitive Level (ACL) Study on 23 patientsStrong correlation between Vocab and AbstractRegress y = ACL on SDMT, Vocab, and AbstractAllen Cognitive Level (ACL) Study on 69 patientsPlot after having collected more dataSlide 25Reducing structural multicollinearityStructural multicollinearityExampleScatter plotA quadratic polynomial regression functionEstimated quadratic functionInterpretation of the regression coefficientsRegress y = igg on oxygen and oxygen2Slide 34“Center” the predictorsWow! It really works!A better quadratic polynomial regression functionRegress y = igg on oxcent and oxcent2Slide 39Estimated regression functionSimilar estimates of coefficients from first-order linear modelThe relationship between the two forms of the modelSlide 43Model evaluationSlide 45Model use: What is predicted IgG if maximal oxygen uptake is 90?The hierarchical approach to model fittingSlide 48Detecting and reducing multicollinearityDetecting multicollinearityCommon methods of detection•Realized effects (changes in coefficients, changes in standard errors of coefficients, changes in sequential sums of squares) of multicollinearity.•Non-significant t-tests for all of the slopes but a significant overall F-test.•Significant correlations among pairs of predictor variables (correlations, matrix scatter plots).•Variance inflation factors (VIF).The first variance at issueFor the model:ipipiiixxxy 1,122110the variance of the estimated coefficient bk is: 212211knikikkRxxbVar2kRwhere is the R2 value obtained by regressing the kth predictor on the remaining predictors.The second variance at issueFor the model:iikkixy0the variance of the estimated coefficient bk is: nikikkxxbVar122minThe ratio of the two variances 222222min1111kkikkkikkkRxxRxxbVarbVarVariance inflation factorsThe variance inflation factor for the kth predictor is:211kkRVIF2kRwhere is the R2 value obtained by regressing the kth predictor on the remaining predictors.Variance inflation factors (VIFk)•A measure of how much the variance of the estimated regression coefficient bk is “inflated” by the existence of correlation among the predictor variables in the model.•VIFs exceeding 4 warrant investigation.•VIFs exceeding 10 are signs of serious multicollinearity.Blood pressure example12011053.2547.7597.32589.3752.1251.8758.2754.42572.565.512011076.2530.7553.2547.7597.32589.3752.1251.8758.2754.42572.565.576.2530.75BPAgeWeightBSADurationPulseStressn = 20 hypertensive individualsp-1 = 6 predictor variablesBlood pressure example BP Age Weight BSA Duration PulseAge 0.659Weight 0.950 0.407BSA 0.866 0.378 0.875Duration 0.293 0.344 0.201 0.131Pulse 0.721 0.619 0.659 0.465 0.402Stress 0.164 0.368 0.034 0.018 0.312 0.506Blood pressure (BP) is the response.Regress y = BP on all 6 predictors Predictor Coef SE Coef T P VIFConstant -12.870 2.557 -5.03 0.000Age 0.70326 0.04961 14.18 0.000 1.8Weight 0.96992 0.06311 15.37 0.000 8.4BSA 3.776 1.580 2.39 0.033 5.3Dur 0.06838 0.04844 1.41 0.182 1.2Pulse -0.08448 0.05161 -1.64 0.126 4.4Stress 0.005572 0.003412 1.63 0.126 1.8S = 0.4072 R-Sq = 99.6% R-Sq(adj) = 99.4%Analysis of VarianceSource DF SS MS F PRegression 6 557.844 92.974 560.64 0.000Residual Error 13 2.156 0.166Total 19 560.000Regress x2 = weight on 5 predictorsPredictor Coef SE Coef T P VIFConstant 19.674 9.465 2.08 0.057Age -0.1446 0.2065 -0.70 0.495 1.7BSA 21.422 3.465 6.18 0.000 1.4Dur 0.0087 0.2051 0.04 0.967 1.2Pulse 0.5577 0.1599 3.49 0.004 2.4Stress -0.02300 0.01308 -1.76 0.101 1.5S = 1.725 R-Sq = 88.1% R-Sq(adj) = 83.9%Analysis of VarianceSource DF SS MS F PRegression 5 308.839 61.768 20.77 0.000Residual Error 14 41.639 2.974Total 19 350.478The variance inflation factor calculated by its definition 40.8881.011112minkkkRbVarbVarThe variance of the weight coefficient is inflated by a factor of 8.40 due to the existence of correlation among the predictor variables in the model.The pairwise correlations BP Age Weight BSA Duration PulseAge 0.659Weight 0.950 0.407BSA 0.866 0.378 0.875Duration 0.293 0.344 0.201 0.131Pulse 0.721 0.619 0.659 0.465 0.402Stress 0.164 0.368 0.034 0.018 0.312 0.506Blood pressure (BP) is the response.Regress y = BP on age, weight, duration and stressPredictor Coef SE Coef T P VIFConstant -15.870 3.195 -4.97 0.000Age 0.68374 0.06120 11.17 0.000 1.5Weight 1.03413 0.03267 31.65 0.000 1.2Dur 0.03989 0.06449 0.62 0.545 1.2Stress 0.002184 0.003794 0.58 0.573 1.2S = 0.5505 R-Sq = 99.2% R-Sq(adj) = 99.0%Analysis of VarianceSource DF SS MS F PRegression 4 555.45 138.86 458.28 0.000Residual Error 15 4.55 0.30Total 19 560.00Reducing data-based multicollinearityData-based multicollinearity•Multicollinearity that results from a poorly designed experiment, reliance on purely observational data, or the inability to manipulate the system on which you collect the data.Some methods•Modify the regression model by eliminating one or more predictor variables.•Collect
View Full Document