DOC PREVIEW
UI STAT 5400 - Computing in Statistics

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

122S:166Computing in StatisticsIntroduction to RLecture 5September 4, 2009Kate Cowles374 SH, [email protected] R is• “an integrated suite of software facilities fordata manipulation, calculatio n, and graphicsdisplay” (An Introduction to R, Venables,Ripley, and the R Core team)– data handling and storage capabilities– operators for calcul a ti ons on arrays andmatrices– data analysis tools– graphical capabilities– programming language– planned and coherent system3• an implementation of S language– S language was developed at AT&T-BellLabs∗ first version 1976– S-Plus is a commercial version of S (beginin 1987)∗ sold and supported by Insightful Corp.∗ GUI∗ many formats supported for graphics ex-port and data input/output∗ runs on Windows, UNIX, Linux (notMacintosh)4• advantages of S– extendible∗ users write new functions in S language— just as developers do∗ excellent documentation for a dding func-tions to system∗ users can create their own data types∗ huge international community of usersconstantly contribute new capabilities∗ contrast with SAS· very hard to write ne w SAS proce-dures· users write in different l anguage (SASmacro or IML) than developers– high-level language∗ only a few commands required to docomplex things5– language is connected to d ata while exe-cutingexample (from Statistical Computing andGraphics course notes by Frank Harrell)if(is.factor(x) | is.character(x) |(is.numeric(x) & length(unique(x)) < 20))table (x) else quantile(x)computes quantiles of x if x is numeric a ndhas at least 20 distinct values, requencytable otherwise– object-oriented∗ fewer commands to learn because thesame command can be ap plied to dif-ferent types of objects– Harrell: “best scientific graphics available”∗ Harrell: “SAS graphics are ugly, inflexi-ble, have poor defaults, difficult to pro-gram”6R• international team of statisticians started de-veloping R in early 1990’s– to provide open source al tern a tive to S-Plus– to provide S implementation on Linux (notsupported by S-Plus then)• easy to download and install fro m web sites• excellent documentation• user-contributed libraries calle d pa ckages ex-pand capabilities• runs on Windows, UNIX, Linux, M acintosh• no GUI on most platforms• fewer data import/export cap a bilities thanS-Plus– although add-on packages provide more– no export specifically to Powerpoint7Starting and running R interactivelyon Linux• recommendation: use a separate subdirec-tory for each major project you do with R• in a terminal window, get into the desiredsubdirectory and start R by enteringR• R commands may be issued i nteractively• to quitq()– follow prompts as to wh ether you want tosave workspace– if you don’t save it, any new obje cts (d a ta ,functions, results) created during the cur-rent R session will be lost8Starting and running R interactivelyon Linux• strongly recommended to use a separate sub-directory for each major R project. You mightwant one subdirectory for your homework a s-signments, and another for your group project.• begin by creatin g the subdirectory• copy in or download any needed data files• then invoke R in that subdi recto ry[kcowles@p-lnx402 ~]$ mkdir examples166[kcowles@p-lnx402 ~]$ cd examples166[kcowles@p-lnx402 ~/examples166]$ ls -a. ..[kcowles@p-lnx402 ~/examples166]$9Reading in data from external files• Use Firefox to download Cars.dat from “Da ta sets”section of co urse web pag e into this directory.[kcowles@p-lnx402 ~/examples166]$ lsCars.dat• Use a text editor to look at this file. Notethat the separators between columns are tabs(You can tell because the cursor jumps) a ndthat th e decimal point in numbers is indi-cated by periods.• We need to read this file into an object inR to analyze it. R has several functions thatread in data files i n different formats.• We will use R’s built-in help facility to figureout which one to use.10[kcowles@p-lnx402 ~/examples166]$ RR version 2.7.1 (2008-06-23)Copyright (C) 2008 The R Foundation for Statistical ComputingISBN 3-900051-07-0R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type ’license()’ or ’licence()’ for distribution details.Natural language support but running in an English localeR is a collaborative project with many contributors.Type ’contributors()’ for more information and’citation()’ on how to cite R or R packages in publications.Type ’demo()’ for some demos, ’help()’ for on-line help, or’help.start()’ for an HTML browser interface to help.Type ’q()’ to quit R.11> help(read.delim)read.table package:utils R DocumentationData InputDescription:Reads a file in table format and creates a data frame from it,with cases corresponding to lines and variables to fields in thefile.Usage:read.table(file, header = FALSE, sep = "", quote = "\"’",dec = ".", row.names, col.names,as.is = !stringsAsFactors,na.strings = "NA", colClasses = NA, nrows = -1,skip = 0, check.names = TRUE, fill = !blank.lines.skip,strip.white = FALSE, blank.lines.skip = TRUE,comment.char = "#",allowEscapes = FALSE, flush = FALSE,stringsAsFactors = default.stringsAsFactors(),encoding = "unknown")read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".",fill = TRUE, comment.char="", ...)read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",",fill = TRUE, comment.char="", ...)read.delim(file, header = TRUE, sep = "\t", quote="\"", dec=".",fill = TRUE, comment.char="", ...)read.delim2(file, header = TRUE, sep = "\t", quote="\"", dec=",",fill = TRUE, comment.char="", ...)....... lots of additional detail ........12> Cars <- read.delim("Cars.dat") # <- is assignment operator> str(Cars) # find out structure of the object’data.frame’: 38 obs. of 8 variables:$ Country : Factor w/ 6 levels "France","Germany",..: 6 6 6 6 6 4 4 6 2 5 ...$ Car : Factor w/ 38 levels "AMC Concord D/L",..: 6 21 11 12 8 34 14 18 3 35 ...$ MPG : num 16.9 15.5 19.2 18.5 30 27.5 27.2 30.9 20.3 17 ...$ Weight : num 4.36 4.05 3.60 3.94 2.15 ...$ Drive_Ratio : num 2.73 2.26 2.56 2.45 3.7 3.05 3.54 3.37 3.9 3.5 ...$ Horsepower : int 155 142 125 150 68 95 97 75 103 125 ...$ Displacement: int 350 351 267 360 98 134 119 105 131 163 ...$ Cylinders : int 8 8 8 8 4 4 4 4 5 6 ...13Data frames• special kind of object• like a table of data– row for each observation– column fo r each


View Full Document

UI STAT 5400 - Computing in Statistics

Documents in this Course
Load more
Download Computing in Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computing in Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computing in Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?