Unformatted text preview:

Outline ggplot2 Stat 849 ggplot2 graphics The pima data set from the faraway package Douglas Bates University of Wisconsin Madison and R Development Core Team Douglas Bates R project org Sept 08 2010 Univariate summary plots Bivariate plots Simple regression or ancova lines Ancova The ggplot2 graphics package Examining the pima data library faraway str pima I Another advanced graphics package for R is ggplot2 by Hadley Wickham a recent Iowa State Stats Ph D now at Rice I His book is listed as one of the references on the course web site I The core chapter introducing the basic function called qplot can be obtained from the URL in the links section on the course web site I I will use data from the faraway package to accompany Julian Faraway s freely available book Practical Regression and Anova using R to illustrate the use of qplot data frame pregnant glucose diastolic triceps insulin bmi diabetes age test 768 obs of 9 variables int 6 1 8 1 0 5 3 10 2 8 int 148 85 183 89 137 116 78 115 197 125 int 72 66 64 66 40 74 50 0 70 96 int 35 29 0 23 35 0 32 0 45 0 int 0 0 0 94 168 0 88 0 543 0 num 33 6 26 6 23 3 28 1 43 1 25 6 31 35 3 30 5 0 num 0 627 0 351 0 672 0 167 2 288 int 50 31 32 21 33 30 26 29 53 54 int 1 0 1 0 1 0 1 0 1 1 head pima 1 2 3 4 5 6 pregnant glucose diastolic triceps insulin bmi diabetes age test 6 148 72 35 0 33 6 0 627 50 1 1 85 66 29 0 26 6 0 351 31 0 8 183 64 0 0 23 3 0 672 32 1 1 89 66 23 94 28 1 0 167 21 0 0 137 40 35 168 43 1 2 288 33 1 5 116 74 0 0 25 6 0 201 30 0 Histogram of diastolic blood pressure I As Faraway indicates several of the values of variables that cannot reasonably be zero are recorded as zero I A bit of research shows that these are missing data values Also the test variable is a factor not numeric pima within pima diastolic diastolic 0 glucose glucose 0 triceps triceps 0 insulin insulin 0 bmi bmi 0 NA test factor test labels c negative positive head pima 3 qplot diastolic data pima geom histogram 100 80 60 count Recoding the missing data 40 pregnant glucose diastolic triceps insulin bmi diabetes age 6 148 72 35 NA 33 6 0 627 50 1 85 66 29 NA 26 6 0 351 31 8 183 64 NA NA 23 3 0 672 32 test 1 positive 2 negative 3 positive 20 1 2 3 Histogram of diastolic bp by test 0 20 40 60 80 100 120 diastolic Empirical density plot qplot diastolic data pima geom histogram fill test qplot diastolic data pima geom density 100 0 030 80 0 025 0 020 count test negative positive density 60 0 015 40 0 010 20 0 005 0 0 000 20 40 60 80 Diastolic blood pressure mg Hg 100 120 40 60 80 Diastolic blood pressure mg Hg 100 120 Empirical density of diastolic by test Simple scatterplot c f Fig 1 2a p 13 qplot diastolic data pima geom density linetype test qplot diastolic diabetes data pima xlab 0 030 2 0 0 025 density 0 020 test negative 0 015 positive 0 010 Diabetes pedigree function 1 5 0 000 120 40 60 Diastolic blood pressure mg Hg 0 5 100 1 0 0 005 80 60 40 80 100 120 Diastolic blood pressure mg Hg Adding a scatterplot smoother Multiple smoothers by group qplot diastolic diabetes data pima geom c point smooth qplot diastolic diabetes data pima geom c point smooth shape test 2 0 2 0 1 5 40 1 0 0 5 1 5 1 0 0 5 80 Diastolic blood pressure mg Hg 100 120 40 60 negative positive test 60 Diabetes pedigree function Diabetes pedigree function 80 Diastolic blood pressure mg Hg 100 120 Comparative boxplots apparently only vertical Adding a simple linear regression line c f Fig 1 3 p 14 qplot test diabetes data pima geom c boxplot p qplot midterm final data stat500 geom c point smooth method lm 2 2 0 1 1 5 0 final Diabetes pedigree function 1 0 1 0 5 2 negative positive 2 Diabetes test result 1 0 1 2 midterm Adding a reference line c f Fig 1 3 p 14 Suppressing the confidence band p geom abline intercept 0 slope 1 color red It happens that the defaults are intercept 0 and slope 1 p qplot midterm final data stat500 geom c point smooth method lm se FALSE geom abline color 2 2 1 1 0 final 0 final 1 1 2 2 2 1 0 1 2 midterm 2 1 0 midterm 1 2 Plotting multiple groups and lines c f Fig 15 2 p 163 levels cathedral style c Gothic Romanesque qplot x y data cathedral geom c point smooth method lm shape style xlab Plotting multiple groups in separate panels qplot x y data cathedral geom c point smooth method lm facets style xlab Gothic Romanesque 600 600 500 500 400 style Gothic Romanesque Total Length ft Total Length ft 400 300 300 200 200 50 50 60 70 Nave Height ft 80 90 100 60 70 80 90 100 Nave Height ft 50 60 70 80 90 100


View Full Document

UW-Madison STAT 849 - The ggplot2 Graphics Package

Download The ggplot2 Graphics Package
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The ggplot2 Graphics Package and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The ggplot2 Graphics Package 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?