DOC PREVIEW
UCLA STATS 10 - Lab 2 Instructions

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A plot is more than 1000 wordsShuffle up and regressCan we do better?Statistics XL10, UCLA Extension Instructor: Jiashen YouLab 2 Instructions1Major League Baseball teams often use statistics to forecast future success. One variablethat is important to a team’s success is total runs scored during a season. If teams candetermine what variables best help them manufacture runs, they can focus on improvingthose parts of their offense. In the data sets for this lab installment, batting statistics forall 30 Major League Baseball teams are included for the 2009 season. In addition to runsscored, there are seven commonly-used variables:• at-bats: plate appearances that resulted in an out or the player getting on base• hits: hits that resulted in the player getting on base (without an error)• homeruns: hits that resulted in the player rounding all 4 bases (without an error)• batting average: hits/at-bats• strikeouts• walks• stolen basesYour lab assignment consists of a short report (no more than 3 pages) sum-marizing answers to all questions throughout this lab manual. You may adjustthe order slightly to suit your own logical flow. Properly scaled graphs andsummary statistics must be included to support any of your conclusion.1 A plot is more than 1000 wordsRun the R code that generates the scatter plot for runs and at bats. How would you describethis scatter plot? Would it be appropriate to fit a least squares linear regression model topredict the number of runs using the number of at-bats? (Please check all conditions forapplying the regression model.)rm(list=ls())bat09 <- read.table("http://www.stat.ucla.edu/~jiashen/stat10/Batting09.txt", header=T)attach(bat09)names(bat09) # shows all varaibleshead(bat09,8) # shows the first several lines of dataplot(runs~at_bats,main="Runs Vs At_bats",xlim=c(5350,5750),ylim=c(600,950))1This lab has been modified from the “Batter Up” lab in the regular Stat 10 course.Lab 2 1Statistics XL10, UCLA Extension Instructor: Jiashen You2 Shuffle up and regressNow, out of the following patch of code and explain what you observe. As more teams areadded and the least-squares regression line (LSRL) recalculated to fit the new data, what doyou notice the amount of change of the new line? The final plot shows the scatter plot withoverlaid LSRL fit on the entire data set. Comment on the standardized residual plot. Whichteam(s) exhibits an unusually large residual? Does it mean that the LSLR model overfits orunderfits in this case? What is the sign of the slope and what does it mean? Note that itwouldn’t make much sense interpreting the intercept coefficient since it means the numberof runs that we would expect, on average, that a team scores when there is no at bats. Now,please write down the equation for the LSRL.shuffle <- sample(1:30)for (i in 5:30) {fit <- lm(runs[shuffle[1:i]]~at_bats[shuffle[1:i]])plot(runs[shuffle[1:i]]~at_bats[shuffle[1:i]],xlim=c(5350,5750),ylim=c(600,950))if (i==5) {text(at_bats[shuffle[1:i]]+5,runs[shuffle[1:i]]+10,team[shuffle[1:i]])Sys.sleep(1) }lines(fit$fitted.values~at_bats[shuffle[1:i]],col=10)text(at_bats[shuffle[i]]+5,runs[shuffle[i]]+10,team[shuffle[i]])title(paste("b1 = ",fit$coefficients[2], " b0 = ", fit$coefficients[1]))Sys.sleep(.6)}par(mfrow=c(2,1))fit2 <- lm(runs~at_bats)plot(runs~at_bats,xlim=c(5350,5750),ylim=c(600,950))lines(fit2$fitted.values~at_bats,col=10)text(at_bats+7.5,runs+7.5, seq(30))title(paste("b1 = ",fit2$coefficients[2], " b0 = ", fit2$coefficients[1]))sdr<-(fit2$residuals-mean(fit2$residuals))/sd(fit2$residuals)plot(sdr~at_bats,ylab="Standardized residuals",xlim=c(5350,5750),ylim=c(-2.5,2.5))abline(h=-2, col=4, lty=2)abline(h=2, col=4, lty=2)If you run the commandsummary(fit2)you will see the following ourput:Call:lm(formula = runs ~ at_bats)Lab 2 2Statistics XL10, UCLA Extension Instructor: Jiashen YouResiduals:Min 1Q Median 3Q Max-116.18 -46.16 -13.05 33.95 135.45Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -2594.0311 838.2929 -3.094 0.004441 **at_bats 0.6044 0.1516 3.986 0.000436 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 60.6 on 28 degrees of freedomMultiple R-squared: 0.362, Adjusted R-squared: 0.3393F-statistic: 15.89 on 1 and 28 DF, p-value: 0.000436It shows that R2for this regression model is .362. Interpret this value. What will be thelinear correlation coefficient?3 Can we do better?In order to find a measure that has a stronger correlation with the number of runs, baseballstatisticians have come up with many other measures so that they may be better at explainingthe variability in runs, or even “predict” it. You can find some examples of those newstatistics in a different file.newbat09 <- read.table("http://www.stat.ucla.edu/~jiashen/stat10/NewBatting09.txt",header=T)head(newbat09)Which of these new statistics has the highest R2with runs? Illustrate using summarystatistics and plots. For the model with the strongest correlation, comment on the slopecoefficient and any unusual feature.Below is the list of explanation for the new variable names:• on base: on-base percentage = (hits + walks + hit-by-pitch) / (at-bats + walks +hit-by-pitch + sacrifice flies)• slugging: slugging percentage = total bases/at-bats• ob slug: on-base plus slugging percentage = on-base percenatage + slugging percentageLab 2


View Full Document

UCLA STATS 10 - Lab 2 Instructions

Download Lab 2 Instructions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lab 2 Instructions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lab 2 Instructions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?