DOC PREVIEW
MASON PSYC 612 - Exploratory Data Analysis

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

PSYC 612, SPRING 2010Exploratory Data Analysis (EDA)Lecture Week: 4/6/2010Contents1 Preliminary Questions 12 Part I: Exploratory Data Analysis (30 minutes; 2 minute break) 12.1 Purpose: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Exploratory Data Analysis (EDA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 Why explore your data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5 Suitable EDA Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.1 Data Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.2 Tabular Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.5.3 Pseudo-graphical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5.4 Graphical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Part II: Digging deeper into exploratory procedures (10 minutes; 2 minutebreak) 113.1 Purpose: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Part III: Matrix algebra (FINALLY!) 121 Preliminary Questions•Are there any questions about missing data?• Is everyone ready for the second module?•Are there any lingering questions about the modules before we begin this week?2 Part I: Exploratory Data Analysis (30 minutes; 2 minutebreak)12.1 Purpose:Introduce you basic concepts o f exploratory data analysis2.2 Objectives:1. Describe exploratory data analysis2. Provide a rationale for exploring data3. Outline different approaches4. Demonstrate various procedures2.3 Exploratory Data Analysis (EDA)Exploratory data analysis is a process - not a single procedure - where the analyst (you) uses severalprocedures to better understand your da ta. The first and now classic text on EDA was writtenseveral decades ago by the eminent statistician John Tukey. In his book, Tukey described variousprocedures and why they may be useful. Those procedures do not constitute EDA. Everything g oeswith EDA. You only need to be curious about every variable, value, and relationship and use tables,figures, and summary statistics to satisfy your curiosity. There are no rules for EDA.2.4 Why explore your data?Exploring your data demands time, effort and attention but those demands can easily save you hoursof frustration. How you might ask will exploratory procedures save you hours? By understandingyour data, you avoid common pitfalls such as distributional problems, scaling a nomalies, and outlierobservations. EDA helps you find these problems before you conduct your analyses.Most social scientists do no exploration before they dive right into hypothesis testing. I rec-ommend you not emulate the field. Always explore your data before you run any analysis. Bestpractices dictate that you understand your dat a before using any procedure. The best data analystsspend roughly 80% o f their time exploring, managing, and transforming data. The remaining 20%goes toward running and interpreting the primary analyses. If you gain some expertise in EDA, youcan become far more efficient and probably reduce that effort substantially. Furthermore, you canavoid silly mistakes that plague many ana lyses.2.5 Suitable EDA ProceduresThere are many different procedures and, as I mentioned previously, there are no rules fo r explorationnor do the procedures define EDA. In spite of that statement, I intend to outline several proceduresthat you might find helpful for exploring your data.2.5.1 Data SummariesThe first and easiest set of procedures are simple summaries including measures of central tendency,measures of dispersion, and simple counts. You know about these procedures but now I want youto think of them as exploratory procedures. Every time you open up a dataset, make sure you rundescriptive statistics - or as I labeled them here, data summaries. Consider the simple statistics inTable 12v1 v2 v3 v4 v5 yMin. :-1.8735 Min. : 1.00 Min. :7.255e-35 1:25 Min. :1.0 Min. :-1.792491st Qu.:-0.6 039 1st Qu.: 3.25 1st Qu.:9.683e-1 5 2:25 1st Qu.:1.0 1st Qu.:-0.70319Median : 0.1387 Median : 6.0 0 Median :1.36 0e-10 Median :1.0 Median :-0.08813Mean : 0.1309 Mean : 5.86 Mean :4.976e-02 Mean :1.1 Mean : 0.010323rd Qu.: 0.8334 3rd Qu.: 8.00 3 r d Qu.:4.778e-05 3rd Qu.:1.0 3rd Qu.: 0.8576 5Max. : 2.5299 Max. :10.00 Max. :1.529e+00 Max. :5.0 Max. : 2.08196Table 1: Summary StatisticsWe might even find that a simple correlation matrix offers a fair bit of information. Considerthe following correlation matrix.v1 v2 v3 v4 v5 yv1 1.00 -0.08 -0.11 0.00 -0.16 0.23v2 - 0.08 1.00 0.22 0.20 0.19 0.0 6v3 - 0.11 0.22 1.00 0.21 0.93 -0.07v4 0.00 0.20 0.21 1.0 0 0.17 0.02v5 - 0.16 0.19 0.93 0.17 1.00 -0.20y 0.23 0.06 -0.07 0.02 -0.20 1.00What do these numbers mean? They help us determine whether relationships are strong orweak, positive or negative, logical or illogical. All of these data summaries are helpful but they arenot sufficient for really exploring your data.2.5.2 Tabular ProceduresTabular procedures help us better understand the values at an even deeper level. I find simpletables (univariate) and cross-tabulations (two variables) offer valuable info rmatio n. Consider thesame dataset as above but described in a tabular format below.3V1-1.87 1-1.55 1-1.31 1-1.26 1-1.08 1-0.92 2-0.82 1-0.77 1-0.73 1-0.67 1-0.63 1-0.62 1-0.55 1-0.53 2-0.48 1-0.35 1-0.33 1-0.31 1-0.24 1-0.05 10 10.04 10.09 10.19 10.34 20.42 20.43 10.46 10.49 10.57 10.59 10.66 10.79 10.85 10.9 10.93 10.96 10.98 10.99 11.09 11.28 11.47 11.54 11.77 11.96 12.53 14The table shows us how many o bservations we have for each value for v1. There are better waysof displaying this information that we will cover in the next section. Perhaps a cross-tabular tablemight offer us a bit more information.1 2-1.9 1 0-1.6 1 0-1.3 1 1-1.1 0 1-0.9 2 0-0.8 0 2-0.7 0 2-0.6 1 2-0.5 2 1-0.3 2 1-0.2 1 00 …


View Full Document

MASON PSYC 612 - Exploratory Data Analysis

Download Exploratory Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exploratory Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploratory Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?