DOC PREVIEW
LSU EXST 7037 - Preparing Data for Multivariate Analysis

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Chapter 7Preparing Data for Multivariate AnalysisSection 7.1Screening, Cleaning, and Preparing Data3Objectives Understand many of the most common data problems for multivariate analysis and the consequences of these problems. Screen for restricted range, small groups, and outliers. Clean and prepare data files for multivariate analysis using SAS. 4Data Reality…Data5Things to Do Before You Begin9 Data files accurate?9 Outliers?9 Restricted ranges in continuous variables?9 Unequal cell sizes in categorical variables?9 Distributions?9 Collinearity/singularity in variables?9 Homogeneous covariance matrices?9 Extent and nature of missing data?This is just a sampling!6Data Preparation is Key to SuccessYou should reasonably expect to spend more time cleaning, verifying, screening, and imputing your data than analyzing it. Data analysis is  90% perspiration 10% analysis 100% FUN with SAS!27Problem: Accuracy of Data FilesLook at summary statistics to verify N, scale, and so on.Check ranges of variables for incorrectly keyed numeric values.Use frequency tables for incorrectly keyed categorical variables.Check data for duplicates. PROC FREQ; PROC SORT nodupkey;Recode items if needed.  DATA step, PROC SQLPROC MEANS min max N mean median;PROC MEANS min max N mean median;8Problem: Outliers and Influential Points......................9Outlier Detection Tools Leverage, DFFITS (PROC REG) Z-scores (PROC STDIZE) Schematic box plots (PROC BOXPLOT).10Outlier Detection Tools Specifically for multivariate outliers: Two- and three-way scatter plots Principal components. 11Problem: Restriction of RangeYears of Education12Near-Zero Group Sizes4042352B1 B2A1A2313Sandwich Nutrition ExampleCaloriesCaloriesTotal FatTotal FatCarbohydratesCarbohydratesSodiumSodiumWeightWeightFiberFiberCategoryCategoryProteinProtein14This demonstration performs multivariate data screening using interactive graphical techniquesOutlier Analysis and Data Screening Using SAS/IML Workshop 2.115Outliers: What to Do?There are several ways of handling outlying data points, the usefulness of which vary by discipline. Use winsorized or trimmed statistics. Analyze data with and without outliers.– If outliers make little difference, leave them in. Delete significant (p < .001) outliers. – Describe outliers, for example (groups, means, and so on).– Report analyses with and without outliers.16Restricted Range: What to Do? Design your study to ensure data collection across a greater range.  Requires planning such as collecting data on targeted groups. Sometimes you can collect additional data after the study and treat “phase” as a block.Create groups and treat the variable as a class rather than a continuous variable. The new variable has less variability than the old but allows you to perform analysis on it.17Unequal Group Size: What to Do?See recommendations for restricted rangeAlso: Combine smaller groups to create more equally sized groups. For example, you may have one large treatment group and three different smaller control groups. Compare treatment group to combined control group.Section 7.2Evaluating Collinearity and Statistical Assumptions419Objectives Discuss multivariate normality, collinearity, and homogeneity of covariance matrices in the context of multivariate statistics. Use graphical and statistical tools in SAS to evaluate assumptions of multivariate statistics.20Problem: Multivariate Nonnormality21Evaluating MV Normality1. Check for univariate normality.– If variables are not UV normal, then they are not MV normal. – Skewness and kurtosis, graphical tools in PROC UNIVARIATE.2. Check for multivariate normality.– Even if variables are UV normal, they might not be MV normal.– Use MV skewness and kurtosis, graphical tools in %MULTNORM macro*.*The %MULTNORM macro requires SAS/IML software or SAS/ETS software and is available at the Technical Support Web site, www.sas.com.22This demonstration illustrates distribution analysis with multivariate data.Multivariate Distribution Analysisch7s2d1.sas23Nonnormal Data: What to Do?Nonparametric and ADF methods Nonparametric or asymptotically distribution-free methodsare possible using many of the MV procedures you learned in this course Sometimes these methods require large sample sizes and can be less powerful than parametric methods.Transform variables Easy to do in a DATA step or PROC SQL. Can make it difficult to interpret results and estimates.24Examples of Useful TransformationsBased on Tabachnik and Fidell 2001, p. 83Y_T = 1/Y;Y_T = 1/(K - Y);Reciprocal of Y Reciprocal of (K - Y)Extreme L-Shaped Extreme J-Shaped*If there are negative or 0 values in the data, add a constant to Y before performing reciprocal or Log/Ln transformationsY_T = LOG10(Y); or Y_T = LOG(Y);Y_T = LOG10(K - Y); or Y_T = LOG(K - Y);Log of Y or Ln of Y Log of (K - Y) or Ln of (K - Y)Large Positive SkewnessLarge Negative SkewnessY_T = SQRT(Y);Y_T = SQRT(K - Y);Square root (Y)Square Root (K - Y) (K = Max(Y) + 1)Moderate Positive SkewnessModerate Negative SkewnessDATA step statementTransformationCharacteristic of Y525Problem: Singularity and/or CollinearityA square matrix is singular if variables used in its calculation are redundant, or if they are linear combinations of one another. Singular matrices cannot be inverted, posing statistical problems for certain analyses. A common cause of singularity is the accidental inclusion of scale scores and the component scale items in the same analysis or of subscale scores and total scores in the same analysis. Example: SAT-Math, SAT-Verbal, and SAT-Total should not be included in the same analysis.26Singularity and CollinearityVariables are said to be collinear if they are highly correlated. Highly correlated variables (ρ > .9) make matrix inversion unstable and problematic and can lead to failures in calculation. Collinear variables can complicate make models difficult to interpret. Collinear predictors in a linear model can cause large standard error estimates, reducing statistical power. Example: SAT and ACT scores should probably not be included in the same model.27Diagnosing Collinearity and SingularitySometimes finding singularity is simple: warning or error message in the log warning or error message in the outputSometimes it is not so simple. Careful data screening and model


View Full Document

LSU EXST 7037 - Preparing Data for Multivariate Analysis

Download Preparing Data for Multivariate Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Preparing Data for Multivariate Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Preparing Data for Multivariate Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?