MASON PSYC 612 - Exploratory Data Analysis

Unformatted text preview:

PSYC 612, SPRING 2010Exploratory Data Analysis (cont.)Lecture Week: 4/13/2010Contents1 Preliminary Questions 12 Part I: Exploratory Data Analysis Demonstration (50 minutes; 2 minute break) 12.1 Purpose: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 Demonstration Dataset 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.5 EDA Demonstation: Dataset 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Part II: What Next? (10 minutes; 2 minute break) 253.1 Purpose: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Part III: Matrix Algebra (cont.; 20 minutes) 254.1 Purpose: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Linear Regression in Scalar No t ation . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4 Linear Regression in Matrix Notatio n . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Preliminary Questions•Has everyone scheduled a module?•Have you sent me content requests for Lecture X?•Any questions about the modules?2 Part I: Exploratory Data Analysis Demonstration (50minutes; 2 minute break)12.1 Purpose:Demonstrate a complete exploratory procedure from start to finish2.2 Objectives:1. Introduce simple dataset2. Discuss EDA options3. Show the results of those options4. Discuss implications of the EDA findings5. Demonstrate how to rectify any potential problems2.3 PreludeExploring your data prior to analyses is a good way to avoid needless wo r k. On the door of ourgraduate student library hung a sign that read:A day in the library saves a year in the lab.Most of us passed through that door with nary a clue as to what that sign meant. Few of ustoiled in labs and wasted time testing things that needed no testing. We had not toiled throughwa sted effort yet because we had no notion of wasted effort. Now that I have wasted time andeffort, I now have a greater appreciation for t hat sign. EDA helps us avoid costly, time-consumingmistakes just as reading the literature helps us avoid chasing fruitless research questions.One excellent example that depicts the power of graphics to detect odd findings is the nowfamous Anscombe dat aset. Below are the bivariate plots of these da ta. I intend to discuss these indetail during lecture.You might benefit from knowing that the different datasets have very similar summary statistics:xbar SDx1 9.00 3.32x2 9.00 3.32x3 9.00 3.32x4 9.00 3.32y1 7.50 2.03y2 7.50 2.03y3 7.50 2.03y4 7.50 2.03and identical inter-relationships:x1 x2 x3 x4 y1 y2 y3 y4x1 1.00 1.00 1.00 -0.50 0.82 0.82 0.82 -0.31x2 1.00 1.00 1.00 -0.50 0.82 0.82 0.82 -0.31x3 1.00 1.00 1.00 -0.50 0.82 0.82 0.82 -0.31x4 -0.50 -0.50 - 0.50 1.00 -0.53 -0.72 -0.34 0.82y1 0.82 0.82 0.82 -0 .5 3 1.00 0.75 0.47 -0 .4 9y2 0.82 0.82 0.82 -0 .7 2 0.75 1.00 0.59 -0 .4 8y3 0.82 0.82 0.82 -0 .3 4 0.47 0.59 1.00 -0 .1 6y4 -0.31 -0.31 -0.31 0.82 -0.49 -0.48 -0.16 1.002Analysis of Variance TableResponse: y1Df Sum Sq Mean Sq F value Pr(>F)x1 1 27.510 27.5100 17.99 0.002170 **Residuals 9 13.763 1.5292---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Analysis of Variance TableResponse: y2Df Sum Sq Mean Sq F value Pr(>F)x2 1 27.500 27.5000 17.966 0.002179 **Residuals 9 13.776 1.5307---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Analysis of Variance TableResponse: y3Df Sum Sq Mean Sq F value Pr(>F)x3 1 27.470 27.4700 17.972 0.002176 **Residuals 9 13.756 1.5285---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Analysis of Variance TableResponse: y4Df Sum Sq Mean Sq F value Pr(>F)x4 1 27.490 27.4900 18.003 0.002165 **Residuals 9 13.742 1.5269---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 15 10 154 6 8 10 12x1y15 10 154 6 8 10 12x2y26 8 10 12y36 8 10 12y4Anscombe’s 4 Regression data sets3How could that be? Anscombe designed these simple datasets to demonstrate the power ofgraphical procedures. If you plotted the residuals of the four bivariate regressions, you would findthat they do not have the same distribution. I would be happy to discuss that at a different timefor anyone interested but suffice it t o say that most of the hard would interpreting identical resultscould easily be averted by plotting the variables before conducting the final analyses.2.4 Demonstration Dataset 1I begin my demonstration with a simple dataset with 52 observations and 4 va r ia bles. The variablescome from a real study that was used to assess student SAT performance for US states (the dataare available from the course website).State Verbal Quant PercElig1 ala 561 555 92 alaska 516 514 503 ariz 524 525 344 ark 563 556 65 calif 497 514 496 colo 536 540 327 conn 510 509 808 dela 503 497 679 d.c. 494 478 7710 fla 499 498 5311 ga 487 482 6312 hawaii 482 513 5213 idaho 542 540 1614 ill 569 585 1215 ind 496 498 6016 iowa 594 598 517 kan 578 576 918 ky 547 547 1219 la 561 558 820 maine 507 503 6821 md 507 507 6522 mass 511 511 7823 mich 557 565 1124 minn 586 598 925 miss 563 548 426 mo 572 572 827 mont 545 546 …


View Full Document

MASON PSYC 612 - Exploratory Data Analysis

Download Exploratory Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exploratory Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploratory Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?