Unformatted text preview:

What is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataPart 1: Introduction to RDouglas BatesUniversity of Wisconsin - Madisonand R Development Core Team<[email protected]>Sept 8, 2010What is R? Data Variables Subsets Missing DataOutlineWhat is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataOutlineWhat is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataOutlineWhat is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataOutlineWhat is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataOutlineWhat is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataOutlineWhat is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataR•R is an Open Source (and freely available) environment forstatistical computing and graphics.•The CRAN links on the course web site provide binarydownloads for Windows, for Mac OS X and for several flavorsof Linux. Source code is also available.•R is under active development - typically two major releasesper year.•R provides data manipulation and display facilities and moststatistical procedures. It can be extended with “packages”containing data, code and documentation. Currently there aremore than 2400 contributed packages in the Comprehensive RArchive Network (CRAN).What is R? Data Variables Subsets Missing DataSimple calculator usage•The R application is started by clicking on an icon or a menuitem. The main window is called the console window.•Arithmetic expressions can be typed in the console window. Ifthe expresssion on a line is complete it is evaluated and theresult is printed.> 5 - 1 + 10[1] 14> 7 * 10/2[1] 35> exp(-2.19)[1] 0.1119167> pi[1] 3.141593> sin(2 * pi/3)[1] 0.8660254What is R? Data Variables Subsets Missing DataComments on the calculator usage•The > symbol at the beginning of the input line is the promptfrom the application, not something that is typed by the user.•If the expression typed is incomplete, say because it contains a( without the corresponding ) then the prompt changes to a+ indicating that more input is required.•The expression [1] at the beginning of the response is anindex indicating that what follows is the first (and in thesecases the only) element of a numeric vector.What is R? Data Variables Subsets Missing DataAssignment of values to names•During a session, data objects can be assigned to names.•The assignment operator is the two-character sequence <-.(The = sign can also be used, except in a few cases.)•The function ls lists the names of objects; rm removesobjects. An alternative to ls is ls.str() which lists objectsin the workspace and provides a brief description of theirstructure.> x <- 5> ls()[1] "x"> ls.str()x : num 5> rm(x)> ls()character(0)What is R? Data Variables Subsets Missing DataVectors•Numeric objects are always stored as vectors (as opposed toscalars).•An easy way to create a non-trivial vector is a sequence,generated by the : operator or the seq function.•When results are printed the number in square brackets at thebeginning of the line is the index of the element at the start ofthe line.•Square brackets are used to specify indices (or, in general,subsets).> (x <- 0:19)[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19> x[5][1] 4> str(y <- x + runif(20, min = 10, max = 20))num [1:20] 17 18.9 17.3 19.2 20.6 ...What is R? Data Variables Subsets Missing DataFollowing the operations on the slides•The lines of R code shown on these slides are available in fileson the course web site. The file for this section is called1Intro.R.•If you open this file in the R application (the File→Openmenu item or <ctrl>-O) and position the cursor at aparticular line, then <ctrl>-R will send the line to the consolewindow for execution and step to the next line.•Any part of a line following a # symbol is a comment.•The code is divided into named “chunks”, typically one chunkper slide that contains code.•In the system called Sweave used to generate the slides theresult of a call to a graphics function must be printed. Ininteractive use this is not necessary but neither is it harmful.•Note that R provides name completion with the <tab> key.After typing part of a name you can use tab to requestcompletion.What is R? Data Variables Subsets Missing DataOutlineWhat is R?Organizing dataAccessing and modifying variablesSubsets of data framesMissing DataWhat is R? Data Variables Subsets Missing DataOrganizing data in R•Standard rectangular data sets (columns are variables, rowsare observations) are stored in R as data frames.•The columns can be numeric variables (e.g. measurements orcounts) or factor variables (categorical data) or ordered factorvariables. These types are called the class of the variable.•The str function provides a concise description of thestructure of a data set (or any other class of object in R). Thesummary function summarizes each variable according to itsclass. Both are highly recommended for routine use.•Entering just the name of the data frame causes it to beprinted. For large data frames use the head and tailfunctions to view the first few or last few rows.What is R? Data Variables Subsets Missing DataData input•The simplest way to input a rectangular data set is to save itas a comma-separated value (csv) file and read it withread.csv.•The first argument to read.csv is the name of the file. OnWindows it can be tricky to get the file path correct(backslashes need to be doubled). The best approach is touse the function file.choose which brings up a “chooser”panel through which you can select a particular file. Theidiom to remember is> mydata <- read.csv(file.choose())for comma-separated value files or> mydata <- read.delim(file.choose())for files with tab-delimited data fields.•With an Internet connection you can use a URL (withinquotes) as the first argument to read.csv or read.delim.(See question 1 in the first set of exercises)What is R? Data Variables Subsets Missing DataIn-built


View Full Document

UW-Madison STAT 849 - Part 1 - Introduction to R

Download Part 1 - Introduction to R
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Part 1 - Introduction to R and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Part 1 - Introduction to R 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?