DOC PREVIEW
UI STAT 5400 - Computing in Statisti

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

122S:166Computing in StatisticsMore on RLecture 7Sept 16, 2009Kate Cowles374 SH, [email protected] in R• vector object used to specify a discrete clas-sification (grouping) of the components ofother vectors of the same length• default way of storing character data in dataframes• used in formulas in R• used in tapply function3Example> help(state,package="datasets")state package:datasets R DocumentationUS State Facts and FiguresDescription:Data sets related to the 50 states of the United States of America.Usage:state.abbstate.areastate.centerstate.divisionstate.namestate.regionstate.x77Details:R currently contains the following "state" data sets. Note thatall data are arranged according to alphabetical order of the statenames.’state.abb’: character vector of 2-letter abbreviations for thestate names.’state.area’: numeric vector of state areas (in square miles).’state.center’: list with components named ’x’ and ’y’ giving theapproximate geographic center of each state in negativelongitude and latitude. Alaska and Hawaii are placed just4off the West Coast.’state.division’: factor giving state divisions (New England,Middle Atlantic, South Atlantic, East South Central, WestSouth Central, East North Central, West North Central,Mountain, and Pacific).’state.name’: character vector giving the full state names.’state.region’: factor giving the region (Northeast, South, NorthCentral, West) that each state belongs to.’state.x77’: matrix with 50 rows and 8 columns giving thefollowing statistics in the respective columns.’Population’: population estimate as of July 1, 1975’Income’: per capita income (1974)...’Area’: land area in square milesSource:U.S. Department of Commerce, Bureau of the Census (1977)_Statistical Abstract of the United States_.U.S. Department of Commerce, Bureau of the Census (1977) _Countyand City Data Book_.References:Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New SLanguage_. Wadsworth & Brooks/Cole.5> data(state)> statedf <- data.frame( abb = state.abb, div = state.division,+ reg = state.region, state.x77[,c("Population","Area")] )> statedf[1:15,]abb div reg Population AreaAlabama AL East South Central South 3615 50708Alaska AK Pacific West 365 566432Arizona AZ Mountain West 2212 113417Arkansas AR West South Central South 2110 51945California CA Pacific West 21198 156361Colorado CO Mountain West 2541 103766Connecticut CT New England Northeast 3100 4862Delaware DE South Atlantic South 579 1982Florida FL South Atlantic South 8277 54090Georgia GA South Atlantic South 4931 58073Hawaii HI Pacific West 868 6425Idaho ID Mountain West 813 82677Illinois IL East North Central North Central 11197 55748Indiana IN East North Central North Central 5313 36097Iowa IA West North Central North Central 2861 559416Function s operating on factors> is.factor(statedf[,"div"])[1] TRUE> levels(statedf[,"div"])[1] "New England" "Middle Atlantic" "South Atlantic"[4] "East South Central" "West South Central" "East North Central"[7] "West North Central" "Mountain" "Pacific"7Using factors in formulas for plotting andmodel fitting> boxplot( Population ~ div, data = statedf )> boxplot( Population ~ div, data = statedf, pars=list(cex.axis=0.75))> dev.copy2eps( file="~/166/lects2005/boxplotstatepop.ps", horizontal=T)8New England South Atlantic West South Central West North Central Pacific0 5000 10000 15000 200009> summary(lm(Population ~ div, data = statedf ))Call:lm(formula = Population ~ div, data = statedf)Residuals:Min 1Q Median 3Q Max-5289.8 -1667.4 -423.6 987.2 15543.2Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 2031.2 1500.3 1.354 0.183207divMiddle Atlantic 10391.8 2598.6 3.999 0.000259 ***divSouth Atlantic 2087.1 1984.7 1.052 0.299154divEast South Central 1347.8 2372.2 0.568 0.573013divWest South Central 3185.8 2372.2 1.343 0.186664divEast North Central 6157.8 2225.3 2.767 0.008446 **divWest North Central 353.3 2044.6 0.173 0.863675divMountain -828.0 1984.7 -0.417 0.678704divPacific 3623.6 2225.3 1.628 0.111109---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1Residual standard error: 3675 on 41 degrees of freedomMultiple R-squared: 0.433, Adjusted R-squared: 0.3224F-statistic: 3.914 on 8 and 41 DF, p-value: 0.00164510> statedf[ statedf["div"] == "Middle Atlantic" ,]abb div reg Population AreaNew Jersey NJ Middle Atlantic Northeast 7333 7521New York NY Middle Atlantic Northeast 18076 47831Pennsylvania PA Middle Atlantic Northeast 11860 44966> statedf[ statedf["div"] == "East North Central" ,]abb div reg Population AreaIllinois IL East North Central North Central 11197 55748Indiana IN East North Central North Central 5313 36097Michigan MI East North Central North Central 9111 56817Ohio OH East North Central North Central 10735 40975Wisconsin WI East North Central North Central 4589 54464> tapply( statedf[,"Population"], statedf[,"div"], mean )New England Middle Atlantic South Atlantic East South Central2031.167 12423.000 4118.250 3379.000West South Central East North Central West North Central Mountain5217.000 8189.000 2384.429 1203.125Pacific5654.80011Graphics in RPlotting functions in base R:• High-level plotting functions create a new plot on thegraphics device, possibly with axes, labels, titles andso on.• Low-level plotting functions add more information toan existing plot, such as extra points, lines and labels.• Interactive graphics functions allow you interactivelyadd information to, or extract information from, anexisting plot, using a pointing device such as a mouse.12Example of high-level function: Plotplot is a generic plotting function whose behavior is de-termined by the clas s of the object(s) to which it is ap-plied.• argument is factor: bar graph of counts of each level> plot( statedf[,"div"], cex.axis=0.75,+ main = "Number of States per Division" )• arguments are two numeric vectors: scatterplot withfirst vector on x-axis> plot( statedf[,"Area"], statedf[, "Population"],+ xlab = "Area in Square Miles", ylab = "Population in thousands")• argument is a data frame: scatterplot matrix> plot( statedf )• plotting one object against each object in an expres-sion– object to left of “∼” will be on y-axis> par(mfrow=c(1,2) )> plot( Population ~ Area + reg, data = statedf )13Low- l evel plotting functions• add extra information (such as points, lines or t ext)to the current plot.• points(x,y)• lines(x,y)• text(x,y,labels,...> attach(statedf)> plot(


View Full Document

UI STAT 5400 - Computing in Statisti

Documents in this Course
Load more
Download Computing in Statisti
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computing in Statisti and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computing in Statisti 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?