DOC PREVIEW
NAU EPS 625 - MULTIPLE REGRESSION

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Screening Data Prior to AnalysisMulticollinearityChecking Assumptions for the Regression ModelResidual PlotsOutliers and Influential Data PointsData EditingMeasuring Outliers on yMeasuring Outliers on Set of PredictorsMeasuring Influential Data PointsMahalanobis’ DistanceSummaryNumber of PredictorsMULTIPLE REGRESSIONSCREENING DATA PRIOR TO ANALYSISAs with any statistical analysis, one should always conduct an initial screening of their data. Oneof the more common concerns with data error(s) is simply data entry errors. For example,making an incorrect entry of a variable value or omitting a value that should have been entered.The researcher should always double-check their data entry to avoid unnecessary (andcompletely avoidable) errors. Dealing with missing or potentially influential data can be handledin multiple ways. For example, the case(s) can be deleted (typically only if they account for lessthan 5% of the total sample) transformed or substituted using one of many options (see forexample Tabachnick & Fidell, 2001).Data can be screened as ungrouped or grouped. With ungrouped data screening, we typicallyconduct frequency distributions and include histograms (with a normal curve superimposed),measures of central tendency, and measures of dispersion. This screening process will allow us tocheck for potential outliers as well as data entry errors. We can also create scatter plots to checkfor linearity and homoscedasticity. Detection of multivariate outliers is typically done throughregression – checking for mahalanobis distance values of concern and conducting a collinearitydiagnosis (discussed in more detail below). Data can also be screened as grouped data. Theseprocedures are similar to those for ungrouped data, with the exception that each group isanalyzed separately. A split file option is conducted for separate analysis. Here again, descriptivestatistics are examined, as well as histograms and plots.MULTICOLLINEARITYWhen there are moderate to high intercorrelations among the predictors, the problem is referredto as multicollinearity. Multicollinearity poses a real problem for the researcher using multipleregression for three reasons:1. It severely limits the size of R, because the predictors are going after much of the samevariance on y.2. Multicollinearity makes determining the importance of a given predictor difficult becausethe effects of the predictors are confounded due to the correlation among them.3. Multicollinearity increases the variances of the regression coefficients. The greater thesevariances, the more unstable the prediction equation will be.The following are three common methods for diagnosing multicollinearity:1. Examine the variance inflation factors (VIF) for the predictors.The quantity  211jR is called the jth variance inflation factor, where 2jR is the squaredmultiple correlation for predicting the jth predictor from all other predictors.The variance inflation factor for a predictor indicates whether there is a strong linearassociation between it and all the remaining predictors. It is distinctly possible for apredictor to have only moderate and/or relatively weak associations with the otherpredictors in terms of simple correlations, and yet to have a quite high R when regressedon all the other predictors.When is the value for the variance inflation factor large enough to cause concern? Asindicated by Stevens (2002): While there is no set rule of thumb on numerical values tocompare the VIF, it is generally believed that if any VIF exceeds 10, there is reason for atleast some concern. In this case; then one should consider variable deletion or analternative to least squares estimation. The variance inflation factors are easily obtainedfrom SPSS or other statistical packages.2. Examine the tolerance value, which refers to the degree to which one predictor canitself be predicted by the other predictors in the model. Tolerance is defined as 1 – R2.The higher the value of tolerance, the less overlap there is with other variables. Thehigher the tolerance value, the more useful the predictor is to the analysis; the smaller thetolerance value, the higher the degree of collinearity. While it is largely debated on targetvalues, a tolerance value of .50 or higher is generally considered acceptable (Tabachnick& Fidell, 2001). Some statisticians accept a value as low as .20 before being concerned.Tolerance tells us two things. First, it tells us the degree of overlap among the predictors,helping us to see which predictors have information in common and which are relativelyindependent. Just because two variables substantially overlap in their information is notreason enough to eliminate one of them. Second, the tolerance statistic alerts us to thepotential problems of instability in our model. With very low levels of tolerance, thestability of the model and sometimes even the accuracy of the arithmetic can be indanger. In the extreme case where one predictor can be perfectly predicted from theothers, we will have what is called a singular covariance (or correlation) matrix and mostprograms will stop without generating a model. If you see a statement in your SPSSprintout that says that the matrix is singular or “not positive-definite,” the most likelyexplanation is that one predictor has a tolerance of 0.00 and is perfectly correlated withother variables. In this case, you will have to drop at least one predictor to break up thatrelationship. Such a relationship most frequently occurs when one predictor is the simplesum or average of the others.Most multiple-regression programs (e.g., SPSS) have default values for tolerance (1 –SMC, where SMC is the squared multiple correlation, or R2) that protect the user againstinclusion of multicollinear IVs (predictors). If the default values for the programs are inplace, the IVs that are very highly correlated with other IVs already in the equation arenot entered. This makes sense both statistically and logically because the IVs threaten theanalysis due to inflation of regression coefficients and because they are not needed due totheir correlation with other IVs.3. Examine the condition index, which is a measure of tightness or dependency of onevariable on the others. The condition index is monotonic with SMS, but not linear with it.A high condition index (values of 30 or


View Full Document
Download MULTIPLE REGRESSION
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MULTIPLE REGRESSION and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MULTIPLE REGRESSION 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?