DOC PREVIEW
UCLA STATS 202A - Lecture14

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 14:Last time•We examined different ways to bring data into R; we discussed several convenience functions that were designed to handle certain kinds of files•Many of these functions returned objects of class data.frame, the class in R that most closely resembles a data table in the style of Excel•We then went through an easy-to-state but difficult-to-implement process of merging two data frames; we illustrated the process using the operator %in% and the function match()An aside• Recall from last time that aggregate() allowed us to combine data that referred to the same building; it is possible to implement this with lower-level tools in R• Similarly, our dance with match() and %in% could be accomplished a little easier with the function merge() (although it is good to know about match() and how it works!)Buildings for which we have both count and positionBuildings for which we only positionBuildings for which we have only count An aside (cont.)• Whether we use match() or merge(), notice that we have specifically avoided writing a loop; in fact, we haven’t even seen a loop in relation to R yet!• There are a series of functions like apply() that perform a kind of “implicit” looping over objects like vectors and matrices and lists; at one point they represented speed gains, now they just make code easier to read (a gain of a different, but equally important, variety)•This style of programming is probably not so familiar to you, but something we will get more practice with todayToday•So far, we have invoked a large number of functions in R without really discussing what’s at work under the hood; today we see how R responds when you enter a command•In the spirit of re-examining and finishing analyses from the beginning of the quarter, we will have a look at the word distributions you computed from the Roberts hearings•Along the way, we will introduce some basic “control-of-flow” operations; we will also get a new take on subsetting and what’s really happening when you type x[1:10]So far...•So far, we have mainly interacted with R by typing expressions at the prompt; they are then parsed and evaluated•We have printed or exhibited objects, •Identified subsets of objects like vectors, matrices, arrays, data frames and lists,•Performed arithmetic and evaluated special operators,•Created new objects by assignment, and •Invoked functionsFunctions•As you might expect by now, functions are also objects, and their class is simply ”function”•Most of the functions we have seen so far are written in R, meaning they are built from components in the language itself; consider, for example, the relationship between scan, read.table and read.csv•Combining functions and other constructions in R, you can, in effect, extend the language; John Chambers would cast this in terms of your role, that is, you move from user to programmerThe purpose of statistical software is to help in the process of learning from data. For many situations, the software’s crucial contribution to the process is to allow the user to express ideas about the data, ideas that imply some desired view or summary. Expressing the idea to the software amounts to programming, though the user will not initially think of it as programming. The software should help in the deception, by making the expression of simple things simple. Simplicity makes demands on both the form of expression (the language and user interface) and the range of expression (the available tools). In other words, there must be some tool available that implements the desired view or summary (closely enough to get started). The user must be able to identify the tool and to express the particular idea, connecting the tool to the data in a simple way. Otherwise the idea will remain unrealized and the underlying process of learning from data will suffer... A new idea is usually only vaguely formed in the user’s mind and the software usually implements only an approximation to this idea. Some ideas never go beyond this stage; either the idea turned out not to be useful or (less often) the initial rough expression was all that was needed. Most useful ideas, however, continue through a process of gradual refinement. Perhaps the original idea was not quite what we meant, or there were additional requirements that only became obvious with experience. Perhaps we simply need to apply the idea, or a variation of it, to different data. The software must support this gradual refinement by suitable programming facilities. Adding or changing details and re-using the idea in different contexts must be easy. If the train of thought proves useful, chances are that the programming aspect will gradually become more serious. The “idea” will gradually become itself a re-usable part of the user’s environment. The statistical software should help, by supporting each step from user to programmer, with as few intrusive barriers as possible...Suggestion 1: The user interface should be integrated into the language and environment, so that the transition from user to programmer is nearly painless. From Users, Programmers and Statistical Software, JCGS, 9:3, 404-422Functions•In fact, most of the computations carried out in R involve the evaluation or invocation of functions (some being more obvious than others)•We have seen already that the easiest way to understand the operation of a function is to look up its manual entry via help()•In addition, because most functions are written in R, we can print out its contents and try to see what its doingFunctions•The general form or syntax of a function is• function ( arglist ) body•The formal arguments of a function are combined into a comma-separated list; it can consist of a symbol or name, a statement of the form ‘name=expression’, or a special formal catch-all argument ‘...’•Within the body of a function we describe a set of computations of some kind; these can be expressed in terms of valid R expressions, and are often contained in curly braces { and }Functions•We can specify default values for a function’s argument using the ‘name=expression’ construction• The argument ‘...’ is used to represent any number of supplied arguments not named in the formal argument list; it can be used when a function acts on any number of objects or when you want to pass variables to functions called laterCalling a function•When we call a function,


View Full Document

UCLA STATS 202A - Lecture14

Download Lecture14
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture14 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture14 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?