HARVARD STAT 335 - Introduction to S-Plus - D711735

Home> Schools> Harvard University> Statistics (STAT) > STAT 335> Introduction to S-Plus

HARVARD STAT 335 - Introduction to S-Plus

Course Stat 335- Statistical Computing Software

Pages 28

Download Save

Unformatted text preview:

Introduction to S-Plus 1 Topics for today • Input / Output • Using data frames • Mathematics with vectors and matrices • Summary statistics • Basic graphicsIntroduction to S-Plus 2 Input: Data files For rectangular data files (n rows, c columns) you usually want to use read.table(). read.table(file, header = F, sep = "", row.names = NULL, col.names = paste("V", 1:fields, sep = ""), as.is = F, na.strings = "NA", skip = 0) The arguments you are normally going to wanted to deal with are header, and sep. header logical flag: if TRUE, then the first line of the file is used as the variable names of the resulting data frame. The default is FALSE, unless there is one less field in the first line of the file than in the second line. sep the field separator (single character), often `"\t"' for tab. If omitted, any amount of white space (blanks or tabs) can separate fields. To read fixed format files, make sep a numeric vector giving the initial columns of the fields.Introduction to S-Plus 3 If the data file doesn’t have as nice structure as required for read.table, you probably want to use scan instead. scan(file="", what=numeric(), n=<<see below>>, sep=<<see below>>, multi.line=F, flush=F, append=F, skip=0, widths=NULL, strip.white=<<see below>>) The important arguments, besides file, are what, sep and flush what a vector of mode numeric, character, or complex, or a list of vectors of these modes. Objects of mode logical are not allowed. If what is a numeric, character, or complex vector, scan will interpret all fields on the file as data of the same mode as that object. So, what=character() or what="" causes scan to read data as character fields. If what is missing, scan will interpret all fields as numeric. If what is a list, then each record is considered to have length(what) fields and the mode of each field is the mode of the corresponding component in what. When widths is given as a vector of length greater than one, what must be a list of the same length as widths.Introduction to S-Plus 4 sep separator (single character), often `"\t"' for tab or `"\n"' for newline. If omitted, any amount of white space (blanks, tabs, and possibly newlines) can separate fields. If widths is specified, then sep tells what separator to insert into fixed-format records. flush if TRUE, scan will flush to the end of the line after reading the last of the fields requested. This allows putting comments after the last field that are not read by scan, but also prevents putting multiple sets of items on one line. While data files in text format are extremely common, you may need to deal with data coming from other packages, such as SAS, Excel, SPSS, etc. These can be read in with sas.get for SAS and importData for many packages, including Excel and SPSS. Note that the version of importData in version 5.1 will often limit the versions of the data files in the other programs. For example, Excel files must be from version 4 or earlier. It appears that the same or similar restructions hold for version 6 as well.Introduction to S-Plus 5 Exporting S data Most of the functions mentioned earlier have counterparts for exporting your S data to other programs. Since text files are usually the easiest to work with, write.table is the one you will use the most. write.table(data, file = "", sep = ",", append = F, quote.strings = F, dimnames.write = T, na = NA, end.of.row = "\n") Its arguments are similar to read.table. One change I suggest is to give a sep argument and not use the default of ‘”,”’. Instead I would use a space ‘” “’, or a tab ‘”\t”’, as it will be easier to read into a program such as Excel. The counterpart to scan is write. write(x, file="data", ncolumns=<<see below>>, append=F) Usually I think that write.table is the way to go, Also you need to be careful with the default for ncolumns. ncolumns number of data items to put on each line of file. Default is 5 per line for numeric data, 1 per line for character data.Introduction to S-Plus 6 importData also has its counterpart for exporting data. Not surprisingly its exportData. Also this is the only way to export SAS files as I can’t find the counterpart to sas.get. Running scripts Like with unix, it is possible to write scripts of S commands, instead of having to type the commands in one by one at the prompt. As part of my example last week, I read in a dataset, generated some plots and created some new variables. cars<-read.table(“/home/irwin/Scourse/93cars.dat”, header=T,row.names=NULL) postscript("citympg.ps",horiz=T) plot(cars.df$weight, cars.df$citympg, xlab="Weight", ylab="CityMPG", main="City MPG versus Weight") abline(lsfit(cars.df$weight, cars.df$citympg)) dev.off() cars.df$cityfuel <- 100/cars.df$citympg postscript("cityfuel.ps",horiz=T) plot(cars.df$weight,cars.df$cityfuel,xlab="Weight", ylab="CityFuel",main="City Fuel versus Weight") abline(lsfit(cars.df$weight,cars.df$cityfuel)) dev.off()Introduction to S-Plus 7 When I did it, I just typed in the command. However, I might have wanted to redo these commands another time. With the source function, its easy to run scripts. Assume the above commands are in a file testscript.s. Then the command source(‘testscript.s’) will run the commands in the above file and return you to the S prompt. source(file, local=F, echo=<<see below>>, n = -1, immediate = NULL) The important argument for source is echo. It determines the amount of output generated by the source command echo if TRUE, each expression will be printed, along with a prompt, before it is evaluated. The default is TRUE if options(echo=T) has been set and length(recordConnection())==0.Introduction to S-Plus 8 For example, > source('testscript.s') Generated postscript file "citympg.ps". Generated postscript file "cityfuel.ps". > source('testscript.s',echo=T) >cars.df<-read.table("/home/irwin/Scourse/93cars.dat", header = T, row.names = NULL) > postscript("citympg.ps", horiz = T) > plot(cars.df$weight, cars.df$citympg, xlab = "Weight", ylab = "CityMPG", main = "City MPG versus Weight") > abline(lsfit(cars.df$weight, cars.df$citympg)) > dev.off() Generated postscript file "citympg.ps". > cars.df$cityfuel <- 100/cars.df$citympg > postscript("cityfuel.ps", horiz = T) > plot(cars.df$weight, cars.df$cityfuel, xlab = "Weight", ylab = "CityFuel", main = "City Fuel versus Weight") >

View Full Document


School:
Email:
New Password:
Confirm Password:

HARVARD STAT 335 - Introduction to S-Plus

Sign up for free to view:

Please select your school