MASON PSYC 612 - Exploratory Data Analysis

Unformatted text preview:

PSYC 612, SPRING 2011Exploratory Data Analysis (EDA)Lecture Week: 4/12 /2011Contents1 Preliminary Questions 12 Part I: Exploratory Data Analysis (50 minutes; 2 minute break) 12.1 Purpose: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Exploratory Data Analysis (EDA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 Why explore your data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5 Suitable EDA Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.1 Data Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.2 Ta bula r Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.5.3 Pseudo-graphical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5.4 Graphical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Part II: Digging deeper into exploratory procedures (30 minutes; 2 minutebreak) 103.1 Purpose: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Part III: Discussion of Topic X and Module 3 111 Preliminary Questions•Are there any questions about missing data?• Is everyone ready fo r the second module?•Are there any lingering quest ions about the modules before we begin this week?2 Part I: Exploratory Data Analysis (50 minutes; 2 minutebreak)12.1 Purpose:Introduce you basic concepts of exploratory data analysis2.2 Objectives:1. Describe exploratory data analysis2. Provide a rationale for exploring data3. Outline different approaches4. Demonstrate various procedures2.3 Exploratory Data Analysis (EDA)Exploratory data analysis is a process - not a single procedure - where the analyst (you) uses severalprocedures t o better understand your data. The first and now classic text on EDA was writtenseveral decades ago by the eminent statistician John Tukey. In his book, Tukey described variousprocedures and why they may be useful. Those procedures do not constitute EDA. Everything goeswith EDA. You only need to be curious about every variable, value, and relationship and use tables,figures, and summary statistics to satisfy your curiosity. There ar e no rules for EDA.2.4 Why explore your data?Exploring your data demands time, effort and attention but those demands can easily save you hoursof frustration. How you might ask will exploratory procedures save you hours? By understandingyour data, you avoid common pitfalls such as distributional problems, scaling anomalies, and outlierobservat ions. EDA helps you find these problems before you conduct your analyses.Most social scientists do no exploration before t hey dive right into hypothesis testing. I rec-ommend you not emulate the field. Always explo re your data before you run any analysis. Bestpractices dictate that you understand your data before using any procedure. The best data analystsspend roughly 80% of their time exploring, managing, and transforming data. The remaining 20%goes toward running and interpreting the primary a nalyses. If you gain some expertise in EDA, youcan become far more efficient and probably reduce that effort substantially. Furthermore, you canavoid silly mistakes that plague many analyses.2.5 Suitable EDA ProceduresThere are many different procedures and, as I mentioned previously, there are no rules for explorationnor do the procedures define EDA. In spite of that statement, I intend to outline several proceduresthat you might find helpful for exploring your data.2.5.1 Data SummariesThe first and easiest set of procedures are simple summaries including measures of central tendency,measures of dispersion, and simple counts. You know about these procedures but now I want youto think of them as exploratory procedures. Every time you open up a dataset, make sure you rundescriptive statistics - or as I labeled them here, data summaries. Consider the simple statistics inTa ble 12v1 v2 v3 v4 v5 yMin. :-2.07971 Min. : 1.00 Min. :4.165e-32 1:25 Min. :1.0 Min. :-1.92541st Qu.:-0.5 3325 1st Qu.: 3.00 1st Qu.:7.783 e-12 2:25 1st Qu.:1.0 1st Qu.:-0.828 6Median : 0.23262 Median : 6.00 Median :1.026e-0 7 Median :1.0 Median :-0.33 98Mean : 0.02250 Mean : 5 .52 Mean :2.504e-02 Mean :1.1 Mean :-0.19 043rd Qu.: 0.71711 3rd Qu.: 8.00 3rd Qu.:8.159e-04 3rd Qu.:1.0 3rd Qu.: 0.3365Max. : 1.53846 Max. :10.00 Max. :6.478e-01 Max. :5 .0 Max. : 1.8728Ta ble 1: Summary StatisticsWe might even find that a simple correlation matrix offers a fair bit of information. Co nsiderthe following correlation matr ix.v1 v2 v3 v4 v5 yv1 1.00 -0.1 9 0.01 -0.14 -0.01 -0.06v2 -0.19 1.00 0.01 -0.09 -0.07 0.14v3 0.0 1 0.01 1.00 0.04 -0.04 -0.06v4 -0.14 -0.09 0.04 1.00 0.17 -0.04v5 -0.01 -0.07 -0.04 0.17 1.00 -0.12y -0.06 0.14 -0.06 -0.04 -0.12 1.00What do these numbers mean? They help us determine whether relationships are stro ng orweak, positive or negative, logical or illogical. All of these data summaries are helpful but they arenot sufficient for really exploring your data.2.5.2 Tabular ProceduresTa bula r procedures help us better understand the values at an even deeper level. I find simpletables (univariate) and cross-tabulations (two variables) offer valuable information. Consider thesame dataset as above but described in a tabular format below.3V1-2.08 1-1.84 1-1.7 1-1.49 1-1.36 1-1.35 1-1.29 1-1.28 1-1.13 1-0.72 1-0.68 2-0.54 1-0.52 1-0.48 1-0.46 1-0.43 1-0.34 1-0.26 1-0.22 1-0.09 1-0.02 10.05 10.14 10.19 10.27 10.3 10.34 10.37 10.49 10.51 10.54 10.57 10.6 10.61 10.68 10.7 10.72 10.84 10.87 10.92 10.99 21 11.08 11.16 11.2 21.25 11.54 14The table shows us how many observations we have for each value for v1. There are better waysof displaying this informat ion that we will cover in the next section. Perhaps a cross-tabular t ablemight offer us a bit more information.1 2-2.1 1 0-1.8 0 1-1.7 1 0-1.5 0 1-1.4 1 1-1.3 0 …


View Full Document

MASON PSYC 612 - Exploratory Data Analysis

Download Exploratory Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exploratory Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploratory Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?