DOC PREVIEW
Berkeley STAT 133 - Statistics 133 Midterm Exam

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Statistics 133 Midterm ExamMarch 2, 2011When I ask for an “R program”, I mean one or more R commands. Try your best to makeyour answers general, i.e. they shouldn’t depend on the specific values presented in theexamples.Total: 40 points1. Consider the following vector of values stored in a variable called x:> x[1] 7 12 9 15 NA 8 14 NA 2 9 NA 8(a) (2 points) Write an R program to return the positions of the missing values in x.Solution:which(is.na(x))(b) (2 points) Write an R program to count the number of non-missing values in x.Solution:sum(!is.na(x))(c) (2 points) Write an R program to replace the missing values in x with the mean ofthe non-missing values in x.Solution:x[is.na(x)] = mean(x,na.rm=TRUE)(d) (2 points) Write an R function that, when passed a vector, will return a vector withthe missing values in the vector replaced by the mean of the non-missing values ofthe vector.Solution:replacena = function(x){x[is.na(x)] = mean(x,na.rm=TRUE)x}12. Consider a data frame called cars:> summary(cars)Country Car MPG Weight HorsepowerFrance : 1 AMC Concord D/L : 1 Min. :15.50 Min. :1.915 Min. : 65.0Germany: 5 AMC Spirit : 1 1st Qu.:18.52 1st Qu.:2.208 1st Qu.: 78.5Italy : 1 Audi 5000 : 1 Median :24.25 Median :2.685 Median :100.0Japan : 7 BMW 320i : 1 Mean :24.76 Mean :2.863 Mean :101.7Sweden : 2 Buick Century Special: 1 3rd Qu.:30.38 3rd Qu.:3.410 3rd Qu.:123.8U.S. :22 Buick Estate Wagon : 1 Max. :37.30 Max. :4.360 Max. :155.0(Other) :32(a) (2 points) Write an R program to plot MPG on the y-axis and Horsepower on thex-axis, using a different color for each level of Country.Solution:library(lattice)xyplot(MPG~Horsepower,group=Country,data=cars)(b) (2 points) Write an R program that will rearrange the rows of the data frame sothat they are sorted by the value of Horsepower.Solution:cars[order(cars$Horsepower),](c) (2 points) Write an R program that will show the row number of the observationwith the with the highest ratio of MPG to weight.Solution:which.max(cars$MPG / cars$weight)23. Consider a vector called book, each element of which contains the text of one sentenceof a book. For the purposes of this question, consider a word as text separated fromother text by one or more blanks.(a) (2 points) Write an R program to find the average number of characters in eachsentence including the blanks, and another program to find the average number ofcharacters in each sentence not including the blanks.Solution:mean(nchar(book))mean(nchar(gsub(’ ’,’’,book)))(b) (2 points) Write an R program to find the average number of words in each line ofthe book.Solution:words = strsplit(book,’ +’)mean(sapply(words,length))(c) (2 points) Write an R program to find the line in the book with the most characters.Solution:book[which.max(nchar(book))]4. Consider a data frame called wine, which contains information about the chemical com-position of different types of wines. Here is some information about the data frame:Type Alcohol Malic.Acid ProlineA:36 Min. :11.03 Min. :0.740 Min. : 278.0B:46 1st Qu.:12.36 1st Qu.:1.597 1st Qu.: 500.5C:35 Median :13.05 Median :1.845 Median : 673.5D:31 Mean :13.00 Mean :2.298 Mean : 746.9E:30 3rd Qu.:13.68 3rd Qu.:3.030 3rd Qu.: 985.0Max. :14.83 Max. :5.510 Max. :1680.0NA’s :2.000(a) (2 points) Write an R program that willl calculate the median of Alcohol andMalic.Acid for each Type of wine.Solution:aggregate(wine[,c(’Alcohol’,’Malic.Acid’)],wine[’Type’],median,na.rm=TRUE)3(b) (2 points) Write an R program to count the number of observations with Alcoholgreater than 13 and Proline less than 650.Solution:sum(wine$Alcohol > 13 & wine$Proline < 650)(c) (2 points) If you were reading this data from a comma-separated file, what optionwould be passed to read.csv to insure that Type was read as a character variable,not a factor?Solution:stringsAsFactors=FALSE(d) (2 points) Write an R program to produce a barplot showing the number of winesof each type in the data frame.Solution:barplot(table(wine$Type))5. Consider the following vector:> text = c(’cat 122’,’dog 213’,’721 chicken’,’fish 42’,’893 duck’)Use regular expressions to answer the following questions:(a) (2 points) Write an R program to create a vector like text, with the number ineach element appearing before the animal name.Solution:sub(’([a-z]+) ([0-9]+)’,’\\2 \\1’,text)(b) (2 points) Write an R program to create a vector containing just the animal namesin text.Solution:gsub(’[0-9 ]’,’’,text)(c) (2 points) Write an R program to produce a vector containing the position of theblank in each element of text.Solution:unlist(gregexpr(’ ’,text))4(d) (2 points) Write an R program to remove the first three characters in each of theelements of textSolution:sub(’^...’,’’,text)6. Consider a data frame called stock. Here are the first few lines of the data frame:> head(stock,n=3)Date Price1 2011-02-25 1.442 2011-02-24 1.393 2011-02-23 1.44Suppose you tried to plot Price versus Date and saw the following:> plot(stock$Date,stock$Price)Error in plot.window(...) : need finite ’xlim’ valuesIn addition: Warning messages:1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion2: In min(x) : no non-missing arguments to min; returning Inf3: In max(x) : no non-missing arguments to max; returning -Inf(a) (2 points) What would you do to fix the problem, and get a meaningful plot?Solution:stock$Date = as.Date(stock$Date)plot(stock$Date,stock$Price)(b) (2 points) What would the class of the stock$Date variable be in order to causethe error message regarding “no non-missing arguments” to min and max.Solution: It would have to be character, because a factor would produce a plotwith lots of little lines (and no error


View Full Document
Download Statistics 133 Midterm Exam
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Statistics 133 Midterm Exam and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistics 133 Midterm Exam 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?