DOC PREVIEW
UI STAT 5400 - Data Structures

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

59*7. R Data Structures7.1 VectorsRecall that vectors may have mode logical, numeric or character.7.1.1 Subsets of VectorsRecall (section 2.6.2) two common ways to extract subsets of vectors:Specify the numbers of the elements that are to be extracted. One can use negative numbers to omit elements.Specify a vector of logical values. The elements that are extracted are those for which the logical value is T.Thus suppose we want to extract values of x that are greater than 10.The following demonstrates a third possibility, for vectors that have named elements:> c(Andreas=178, John=185, Jeff=183)[c("John","Jeff")]John Jeff185 183A vector of names has been used to extract the elements.7.1.2 Patterned DataUse 5:15 to generate the numbers 5, 6, É, 15. Entering 15:5 will generate the sequence in the reverse order.To repeat the sequence (2, 3, 5) four times over, enter rep(c(2,3,5), 4) thus:> rep(c(2,3,5),4)[1] 2 3 5 2 3 5 2 3 5 2 3 5>If instead one wants four 2s, then four 3s, then four 5s, enter rep(c(2,3,5), c(4,4,4)).> rep(c(2,3,5),c(4,4,4)) # An alternative is rep(c(2,3,5), each=4)[1] 2 2 2 2 3 3 3 3 5 5 5 5Note further that, in place of c(4,4,4) we could write rep(4,3). So a further possibility is that in place ofrep(c(2,3,5), c(4,4,4)) we could enter rep(c(2,3,5), rep(4,3)).In addition to the above, note that the function rep() has an argument length.out, meaning Òkeep onrepeating the sequence until the length is length.out.Ó7.2 Missing ValuesIn R, the missing value symbol is NA. Any arithmetic operation or relation that involves NA generates an NA.This applies also to the relations <, <=, >, >=, ==, !=. The first four compare magnitudes, == tests for equality,and != tests for inequality. Users who do not carefully consider implications for expressions that include Nasmay be puzzled by the results. Specifically, note that x==NA generates NA.Be sure to use is.na(x) to test which values of x are NA. As x==NA gives a vector of NAs, you get noinformation at all about x. For example> x <- c(1,6,2,NA)> is.na(x) # TRUE for when NA appears, and otherwise FALSE[1] FALSE FALSE FALSE TRUE> x==NA # All elements are set to NA[1] NA NA NA NA> NA==NA[1] NAWARNING: This is chiefly for those who may move between R and S-PLUS. In important respects, RÕsbehaviour with missing values is more intuitive than that of S-PLUS. Thus in Ry[x>2] <- x[x>2]60gives the result that the na•ve user might expect, i.e. replace elements of y with corresponding elements of xwherever x>2. Wherever x>2 gives the result NA, no action is taken. In R, any NA in x>2 yields a value of NAfor y[x>2] on the left of the equation, and a value of NA for x[x>2] on the right of the equation.In S-PLUS, the result on the right is the same, i.e. an NA. However, on the left, elements that have a subscriptNA drop out. The vector on the left to which values will be assigned has, as a result, fewer elements than thevector on the right.Thus the following has the effect in R that the na•ve user might expect, but not in S-PLUS:x <- c(1,6,2,NA,10)y <- c(1,4,2,3,0)y[x>2] <- x[x>2]yIn S-PLUS it is essential to specify, in the example just considered:y[!is.na(x)&x>2] <- x[!is.na(x)&x>2]Here is a further example of RÕs behaviour:> x <- c(1,6,2,NA,10)> x>2[1] FALSE TRUE FALSE NA TRUE> x[x>3] <- c(21,22) # Now, explain the result that followsWarning message:number of items to replace is not a multiple of replacement length> x[1] 1 21 2 NA 21The safe way, in both S-PLUS and R, is to use !is.na(x) to limit the selection, on one or both sides asnecessary, to those elements of x that are not NAs. We will have more to say on missing values in the section ondata frames that now follows.7.3 Data framesThe concept of a data frame is fundamental to the use of most of the R modelling and graphics functions. Adata frame is a generalisation of a matrix, in which different columns may have different modes. All elementsof any column must however have the same mode, i.e. all numeric or all factor, or all character.Data frames where all columns hold numeric data have some, but not all, of the properties of matrices. Thereare important differences that arise because data frames are implemented as lists. To turn a data frame ofnumeric data into a matrix of numeric data, use as.matrix().Lists are discussed below, in section 7.6.7.3.1 Extraction of Component Parts of Data framesConsider the data frame barley that accompanies the lattice package:> names(barley)[1] "yield" "variety" "year" "site"> levels(barley$site)[1] "Grand Rapids" "Duluth" "University Farm" "Morris"[5] "Crookston" "Waseca"We will extract the data for 1932, at the Duluth site.> Duluth1932 <- barley[barley$year=="1932" & barley$site=="Duluth",+ c("variety","yield")]variety yield66 Manchuria 22.5666772 Glabron 25.8666778 Svansota 22.2333384 Velvet 22.4666790 Trebi 30.600006196 No. 457 22.70000102 No. 462 22.50000108 Peatland 31.36667114 No. 475 27.36667120 Wisconsin No. 38 29.33333The first column holds the row labels, which in this case are the numbers of the rows that have been extracted.In place of c("variety","yield") we could have written, more simply, c(2,4).7.3.2 Data Sets that Accompany R PackagesType in data() to get a list of data sets (mostly data frames) associated with all packages that are in the currentsearch path. To get information on the data sets that are included in the datasets package, specifydata(package="datasets")and similarly for any other package.In versions of R previous to 2.0.0, it is usually necessary to specifically bring any of these data frames into theworking directory. (Ensure though that the relevant package is attached.) Thus to bring in the data setairquality (datasets package), typedata(airquality)The default Windows distribution includes many commonly required packages. Other packages must beexplicitly installed. For remaining sections of these notes, the MASS package, which comes with the defaultdistributution, will be used from time to time.The base package, and several other packages, are automatically attached at the beginning of the session. Toattach any other installed package, use the library() command.7.4 Data EntryThe function read.table() offers a ready means to read a rectangular array into an R data frame. Supposethat the file primates.dat contains:"Potar monkey" 10 115Gorilla 207 406Human 62 1320"Rhesus monkey" 6.8 179Chimp 52.2 440Thenprimates <- read.table("a:/primates.txt")will create the data frame primates, from a file on the a:


View Full Document

UI STAT 5400 - Data Structures

Documents in this Course
Load more
Download Data Structures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Structures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Structures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?