DOC PREVIEW
UCLA STATS 100C - Data Analysis

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

University of California, Los AngelesDepartment of StatisticsStatistics 13 Instructor: Nicolas ChristouData analysis withR- Some simple commandsWhen you are in R, the command line begins with>To read data from a website use the following command:data <- read.table("http://www.stat.ucla.edu/~nchristo/statistics100C/body_fat.txt", header=TRUE)The result of the command read.table is a “data frame” (it looks like a table). In our ex-ample we give the name data to our data frame. The columns of a data frame are variables.This file contains data on percentage of body fat determined by underwater weighing andvarious body circumference measurements for 251 men. Here is the variable description:Variable Descriptionx1Density determined from underwater weighingy Percent body fat from Siri’s (1956) equationx3Age (years)x4Weight (lbs)x5Height (inches)x6Neck circumference (cm)x7Chest circumference (cm)x8Abdomen 2 circumference (cm)x9Hip circumference (cm)x10Thigh circumference (cm)x11Knee circumference (cm)x12Ankle circumference (cm)x13Biceps (extended) circumference (cm)x14Forearm circumference (cm)x15Wrist circumference (cm)If the data file is on your computer (e.g. on your desktop), first you need to change theworking directory by clicking on Misc at the top of your screen and then read the data asfollows:> data <- read.table("filename.txt", header=T)Note: the expression <- is an assignment operator.Once we read the data we can display them by simply typing at the command line < data.Or if we want we can display the first 6 rows of the data by typing > head(data). Here isthe output:> head(data)x1 y x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x151 1.0853 6.1 22 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.22 1.0414 25.3 22 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.63 1.0754 10.3 23 188.15 77.50 38.0 96.6 85.3 102.5 59.1 37.6 23.2 31.8 29.7 18.34 1.0722 11.7 23 198.25 73.50 42.1 99.6 88.6 104.1 63.1 41.7 25.0 35.6 30.0 19.25 1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.16 1.0775 9.4 23 159.75 72.25 35.5 92.1 77.1 93.9 56.1 36.1 22.7 30.5 27.2 18.21Useful commands:• Extracting one variable from the data frame (e.g. the second variable): > data[,2]• Another way to extract a variable : > data$y• Similarly if we want to access a particular row in our data (e.g. first row): > data[1,]• To list all the data simply type: > data• To compute the mean of all the variables in the data set: > mean(data)• To compute the mean of just one variable: > mean(data$y)• To compute the mean of variables 2 and 3: > mean(data[,c(2,3)])• To compute the variance of one variable: > var(data$y)• To compute summary statistics for all the variables: > summary(data).• To construct stem-and-leaf plot, histogram, boxplot:> stem(data$y)> boxplot(data$y)> hist(data$y)• To plot variable y against variable x10:> plot(data$y,data$x10)• And you can give names to the axes and to your plot:> plot(data$y,data$x10, main="Scatterplot of percent body fat againstthigh circumference", xlab="Percent body fat",ylab="Thigh circumference")And here is the plot:●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0 10 20 30 4050 60 70 80Scatterplot of percent body fat against thigh circumferencePercent body fatThigh circumference2• To save a plot as a pdf file under the working directory (e.g. your desktop):> pdf("box_y.pdf")> boxplot(data$y)> dev.off()On your computer Desktop this is what you get (under the name “box y.pdf”):●0 10 20 30 40If you want to read more about a specific command (for example the histogram) atthe command line you type the following:> ?hist> ?boxplot• Exercise:Construct the same plots with different variables and save them on your desktop.3Another data set:The following data were collected in the area west of the town Stein in the Netherlands nearthe river Meuse (Dutch Maas) river (see map below). The actual data set contains manyvariables but here we will use the x, y coordinates and the concentration of lead and zincin ppm at each data point. The motivation for this study was to predict the concentrationof heavy metals around the banks of the Maas river in this area. These heavy metals wereaccumulated over the years because of the river pollution. Here is the area of study:4Exercise:a. You can access these data atb <- read.table("http://www.stat.ucla.edu/~nchristo/statistics100C/soil.txt", header=TRUE)b. Construct the stem-and-leaf plot, histrogram, and boxplot for each one of thetwo variables (lead and zinc), and compute the summary statistics. What do youobserve?c. Transform the data in order to produce a symmetrical histrogram. Here is whatyou can do:> log_lead <- log(b$lead)> log_zinc <- log(b$zinc)Construct the stem-and-leaf plot, histrogram, and boxplot for each one of the newvariables (log lead and log zinc), and compute the summary statistics. Whatdo you observe now.Here is a side by side boxplot of the variables lead and zinc. First create a new data framewith only the variables lead and zinc:b1 <- soil[,3:4]Then you can construct a side by side boxplots of lead and zinc using:> boxplot(b1)Note: You can also do this by simply typing:boxplot(b[,3:4]) or boxplot(b$lead, b$zinc).●●●●●●●●●●●●lead zinc0 500 1000 15005Other useful commands in R:• To enter data in R use <- or the equal sign =. The <- is preferred. Here are someexamples:> x <- c(1,2,3,4,5)> y <- c(10,20,30,40,50)> q <- data.frame(cbind(x,y))And here is what you get:> x[1] 1 2 3 4 5> qx y[1,] 1 10[2,] 2 20[3,] 3 30[4,] 4 40[5,] 5 50• To rename variables:> names(q) <- c("a", "b")> qa b1 1 102 2 203 3 304 4 405 5 506An example using themapspackageData on ozone and other pollutants are collected on a regular basis. The data set for


View Full Document

UCLA STATS 100C - Data Analysis

Documents in this Course
Load more
Download Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?