Unformatted text preview:

Detecting OutliersOutliersUnivariate and Multivariate OutliersStandard Scores Detect Univariate OutliersMahalanobis D2 and Multivariate OutliersProblem 1Descriptive statistics compute standard scoresSelect the variable(s) for the analysisMark the option for computing standard scoresThe z-score variable in the data editorOutliers with unusually low scoresAdditional information about the outliersThe raw data scores for the outliersComparing the raw scores to the meanOutliers with unusually high scoresSlide 16Answer to the problemDeleting the z-score variableOther problems on univariate outliersProblem 2Mahalanobis D2 is computed by RegressionAdding the independent variablesAdding the dependent variableAdding Mahalanobis D2 to the datasetSpecify saving Mahalanobis D2 distanceSpecify the statistics output neededRequest descriptive statisticsComplete the request for Mahalanobis D2Mahalanobis D2 scores in the data editorComputing the probability of D²Specifying the variable name and functionCompleting the specifications for the functionProbabilities for D² in the data editorIdentifying outliersAnswering the original questionEvaluating Multivariate OutliersMoving columns in the data editor – step 1Moving columns in the data editor – step 2Moving columns in the data editor – step 3Moving columns in the data editor – step 4Highlighting the outliers for analysisEvaluating the outlier casesDeleting variables added to datasetOther problems on multivariate outliersThe script will detect outliersOutliers in the data viewSteps in evaluating outliersSW388R7Data Analysis & Computers IISlide 1Detecting OutliersDetecting univariate outliersDetecting multivariate outliersSW388R7Data Analysis & Computers IISlide 2OutliersOutliers are cases that have data values that are very different from the data values for the majority of cases in the data set.Outliers are important because they can change the results of our data analysis.Whether we include or exclude outliers from a data analysis depends on the reason why the case is an outlier and the purpose of the analysis.SW388R7Data Analysis & Computers IISlide 3Univariate and Multivariate OutliersUnivariate outliers are cases that have an unusual value for a single variable. In our analyses, we will be concerned with univariate outliers for the dependent variable in our data analysis.Multivariate outliers are cases that have an unusual combination of values for a number of variables. The value for any of the individual variables may not be a univariate outlier, but, in combination with other variables, is a case that occurs very rarely. In our analyses, we will be concerned with multivariate outliers for the set of independent variables in our data analysis.SW388R7Data Analysis & Computers IISlide 4Standard Scores Detect Univariate OutliersOne way to identify univariate outliers is to convert all of the scores for a variable to standard scores.If the sample size is small (80 or fewer cases), a case is an outlier if its standard score is ±2.5 or beyond.If the sample size is larger than 80 cases, a case is an outlier if its standard score is ±3.0 or beyondThis method applies to interval level variables, and to ordinal level variables that are treated as metric. It does not apply to nominal level variables.SW388R7Data Analysis & Computers IISlide 5Mahalanobis D2 and Multivariate OutliersMahalanobis D2 is a multidimensional version of a z-score. It measures the distance of a case from the centroid (multidimensional mean) of a distribution, given the covariance (multidimensional variance) of the distribution.A case is a multivariate outlier if the probability associated with its D2 is 0.001 or less. D2 follows a chi-square distribution with degrees of freedom equal to the number of variables included in the calculation.Mahalanobis D2 requires that the variables be metric, i.e. interval level or ordinal level variables that are treated as metric.SW388R7Data Analysis & Computers IISlide 6Problem 1SW388R7Data Analysis & Computers IISlide 7Descriptive statistics compute standard scoresTo compute standard scores in SPSS, select the Descriptive Statistics | Descriptives… command from the Analyze menu.SW388R7Data Analysis & Computers IISlide 8Select the variable(s) for the analysisFirst, click on the variable to be included in the analysis to highlight it.Second, click on right arrow button to move the highlighted variable to the list of variables.SW388R7Data Analysis & Computers IISlide 9Mark the option for computing standard scoresFirst, click on the checkbox to save standard score values as a new variable in the dataset. The new variable will have the letter z prepended to its name, e.g. the standard score variable for “educ” will be “zeduc”.Second, click on the OK button to complete the analysis request.SW388R7Data Analysis & Computers IISlide 10The z-score variable in the data editorThe variable containing the standard scores will be added to the list of variables in the data editor.To identify outliers below –3.0, we sort the database in ascending order.Right click on the variable header zeduc and select the Sort Ascending command from the popup menu.SW388R7Data Analysis & Computers IISlide 11Outliers with unusually low scoresCases that are outliers because they have unusually low scores for the variable will appear at the top of the sorted list. Since there are 269 cases with valid data for the variable, the criterion for identifying an outlier is ±3.0.In this example, we have two outliers with z-scores less than –3.0.SW388R7Data Analysis & Computers IISlide 12Additional information about the outliersTo see additional information about the outliers, we highlight the rows containing the outliers and scroll horizontally to other variables in which we are interested, for example, the id numbers for the cases.SW388R7Data Analysis & Computers IISlide 13The raw data scores for the outliers Before deciding whether we retain or omit outliers from the analysis, we should examine the raw scores that made these cases outliers.In this example, one of our subjects had completed only 2 years of school and another had completed only 3 years.SW388R7Data Analysis & Computers IISlide 14Comparing the raw scores to the mean When we compare the raw data values of 2 and 3 to the mean (13.12) and standard deviation (2.930) of the distribution for the variable, we see why these cases are outliers for this


View Full Document

UT SW 388R - Detecting Outliers

Download Detecting Outliers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Detecting Outliers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Detecting Outliers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?