DOC PREVIEW
UI STAT 5400 - Lab session 3 Elementary data analysis in R

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

22S:166Lab session 3Elementary data analysis in RSep. 9, 20111 Setting upUse NX Client to get onto the Linux network. Use ssh to log into one of themachines in the 346 lab.Choose a subdirectory to use for this R session. Go to the “Datasets” section ofthe course web page. Read the file called Bap.info, and download the data filecalled Bap.txt into the subdirectory you wish to use. Then call up R from thatsubdirectory.2 Using on-line help in RYou can get help on any R function by typing help ( <command name> ). Forexample, to get help on the⁀library fu nction, enter> help(library)To use “prettier” help in separate window, enter> help.start()After a while, a web window will come up. You can click on choices for help doc-uments in that window. If you type help ( <command name> ) in the commandwindow later in the R session, the results will appear in th e web window.Another function that is useful in learning R and getting help is apropos. It findsall functions whose names contain the character string given as the argument. Onlypackages that have been loaded into memory are searched. For example,> apropos("rank")Yet another useful function is help.search. It looks for any function in any in-stalled package that mentions the search term in its help. The name of the packagecontaining the function is in parentheses. For example,> help.search("rank")Help files with alias or concept or title matching rank usingregular expression matching:rank(base) Sample RanksSignRank(stats) Distribution of the Wilcoxon Signed RankStatistic. .1. .. .3 Using built-in R datasetsUse the search function to determine which R packages are loaded automaticallywhen you brin g R up.> search()[1] ".GlobalEnv" "package:stats" "package:graphics"[4] "package:grDevices" "package:utils" "package:datasets"[7] "package:methods" "Autoloads" "package:base"Notice th at the datasets, stats, and graphics packages are listed. This meansyou can access any of the built-in R datasets an d any functions in the stats andgraphics packages without having to load the packages yourself. To get names anddescriptions of the datasets, enter> help(package=datasets)To display the Orange dataset (it is a data frame), j ust enter> OrangeTo get descriptive information on the dataset named Orange, enter> help(Orange, package=datasets)Notice the example code at the end of the help. Copy th e line that begins coplotinto the command window and execute it. This is an example of a very powerfulplotting f unction in the graphics package. Type in the necessary command to gethelp on this function.4 Reading in an external fileTo r ead the BaP data into a data f rame called Bap, enterBaP <- read.table("BaP.txt", header = T)If the data file was in a different subdirectory, we would have to enter its full pathname.We can also read it in straight from its Internet location:BaP <- read.table("http://www.stat.uiowa.edu/ftp/kcowles/datasets/BaP.txt", header = T)To get a scatterplot with the indoor measurements on the X axis and the outdoormeasurements on the Y axis, enter:plot(BaP$indoor, BaP$outdoor)If we want to refer to the variables indoor and outdoor without refencing thedataframe, we need to attach the dataframe.2attach(BaP)Note that this makes BaP the second item in th e search list.search()Now we could just enter plot(indoor, outdoor).When we are done using th e data frame, we should detach it to free up memory.detach(BaP)Use search() to verify that the BaP data frame has been detached.5 Using the library that accompanies the assigned read-ingsVerzani wrote an add-on R library to accompany his book Simple R. In th e onlineversion of the book, he refers to this library as the Simple library. However, thename has been changed and it is now called the UsingR library. I have installed itunder R on our Linux system. This library does not load automatically when youcall up R. We have to load it usin g the library fu nction. We will load this libraryand u s e some of the datasets in it to learn some data analysis functions in R.> .libPaths( c(.libPaths(), ’/group/statsoft/Rlibs64’))> library(UsingR) # load library> help(package="UsingR") # get information on all functions and datasets in library> help(corn) # get information on one particular dataset in library> corn # display dataframeNew Standard1 110 1022 103 863 95 884 94 755 87 896 119 1027 102 1058 93 889 87 8310 98 8911 105 10012 117 1106 Elementary data analysis in RWe want to determine whether the mean yield of New corn is larger than that ofStandard corn.3Begin with exploratory analysis of data – both numeric and graphical.> summary(corn)New StandardMin. : 87.00 Min. : 75.001st Qu.: 93.75 1st Qu.: 87.50Median :100.00 Median : 89.00Mean :100.83 Mean : 93.083rd Qu.:106.25 3rd Qu.:102.00Max. :119.00 Max. :110.00> boxplot(corn) # side-by-side boxplots of all variables in dataset> attach(corn) # make the individual variables available by name> hist( New, probability = T) # make a probability histogram> lines( density( New ), col = "red" )> hist( Standard, probability = T)> lines( density( Standard ), col = "red" )# Further assessment of whether samples might come from normal populations> qqnorm( Standard ); qqline( Standard, col = 2)> qqnorm( New ) ; qqline( New, col = 2)These results are equivocal as to whether the normality assumption holds. A t-testmight be appropriate, but a nonparameteric test probably is safer. We will do bothand see whether they agree.> t.test( New, Standard, alternative = "greater" )Welch Two Sample t-testdata: New and Standardt = 1.8061, df = 21.996, p-value = 0.04231alternative hypothesis: true difference in means is greater than 095 percent confidence interval:0.3815088 Infsample estimates:mean of x mean of y100.83333 93.08333# Note that the confidence interval is one-sided, to match the requested one-sided alternative# hypothesis.> help(wilcox.test) # get help on nonparametric test for equality of centers4> wilcox.test( New, Standard, alternative = "greater", conf.int = T )Wilcoxon rank sum test with continuity correctiondata: New and StandardW = 99, p-value = 0.06264alternative hypothesis: true location shift is greater than 095 percent confidence interval:-3.221176e-05 Infsample estimates:difference in location7.00006Warning messages:1: In wilcox.test.default(New, Standard, alternative = "greater", conf.int = T) :cannot compute exact p-value with ties2: In wilcox.test.default(New, Standard, alternative = "greater", conf.int = T) :cannot compute exact


View Full Document

UI STAT 5400 - Lab session 3 Elementary data analysis in R

Documents in this Course
Load more
Download Lab session 3 Elementary data analysis in R
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lab session 3 Elementary data analysis in R and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lab session 3 Elementary data analysis in R 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?