This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 13 Lab4Feb 22 2007Confidence Intervals and Local BaseballThis lab will use the first half of a baseball season to form proportion confidence intervalsfor a player’s true batting ability and computer our results to batting averages from the secondhalf of the season.1. About the DataThe data comes from the web pages of the Anaheim Angeles and the Los Angeles Dodgers.Using the ”stats” sections of the web pages, the players for both teams were sorted basedon number of at-bats and the top 22 players from each team were selected. The recordsreflect the performances of players in the regular 2002 baseball season up until the all-starbreak.2. Today’s Lab• Loading the data set into Stata and get a feel for what is contained in the dataset:use http://www.stat.ucla.edu/labs/datasets/baseball.dtadescribebrowseYou’ll notice that the most common baseball statistic, batting average, is missingfrom our variable list. We begin by adding this variable to the dataset using thegenerate command that we’ve seen in earlier labs.generate bavg=hits/atbatsWe add the other use ful variable: observation number for each player, i.e. theirlocation in the data set. Please notice there is an underscore before n.generate obnum=_n• To find out which players have the highest batting averages, we can sort the data bybavg and then list the top ten players:sort bavglist in 35/44Q: What is the highest batting average? What about the players with thes e highbatting averages do you think enabled them to have such an incredibly high average?• What’s important to realize here, is that these batting averages do not reflect all ofthe at bats these players will ever have, they are merely the batting average for thesample of at bats that occured in the first half of the 2002 season. We can use thissample information to make inference about the population.1Often in statistics, we use confidence intervals when trying to estimate un-known population parameters by using the sample statistics we know. Thereasoning behind confidence intervals is fairly intuitive. Say you were asked to esti-mate the height of your professor. What is the probability you’d be right if you gaveone precise answer? What is the probability you’d be right if you answered that yourprofessor is somewhere between five feet and seven feet tall? Much higher likelihoodof you being right when you give a range rather than one value, isn’t there? Sincestatisticians, like everyone else, like being right, we give an interval estimate ratherthan just a point estimate.In this case, the population parameter that we’re interested in estimatingis the true season batting average of the 44 players in our data set. Thebatting average is the proportion of at-bats in which a player achieves a hit, so thepopulation parameter we’re interested in is a proportion.Q: What is the equation for a confidence interval of a proportion?Q: What are the lowest 10 batting average? How might they create problems whencalculating the confidence intervals f or these players• Since we have such a small sample for some of the players, we will drop all playersout of the data set that have fewer than 10 at bats and fo cus only on those playersfor whom we have more data.drop if atbats < 10Using the gener ate and invttail commands, we will have Stata calculate the upperand lower bounds on the confidence intervals for each of the 44 players.The invttail function reads in two arguments, the first is the degrees of free-dom, and the second is the cumulative area under the curve from the right.The function takes in these two arguments, and returns the corresponding t-teststatistic. Since we do not have the population standard deviation, we needto use the t distribution, rather than the normal z curve, in our confi-dence interval calculations. The degree of freedom will vary for each player sinceit is based on the number of at bats. We want to create 95% confidence intervals,thus the area under the curve that we enter into the invttail function is 0.025.generate lower=bavg-invttail(atbats-1, .025)*(sqrt((bavg*(1-bavg))/atbats))generate upper=bavg+invttail(atbats-1, .025)*(sqrt((bavg*(1-bavg))/atbats))To create a nice graphical display of the confidence intervalstwoway (rcap lower upper obnum) (scatter bavg obnum)This command tell Stata to graph bavg, lower, and upper on the y-axis with obnumon the x-axis. The rcap command lets Stata know to connect the lower value to theupper value with an ”I” shape.Q: Why are some confidence intervals longer than others?2• To look at the effect a larger sample has on the size of a confidence interval, sortthe data by number of bats, then create an obnum2 variable so this order can bemaintained in our plot.sort atbatsgenerate obnum2=_nQ: How do you expect our confidence interval graph to change when we use theobnum2 variable on the x-axis instead of obnum? Why?Check and see if the graph is as you expected.twoway (rcap lower upper obnum2) (scatter bavg obnum2)• To single out an individual player, use the list command to find their obnum, andthen list again to get all information on the player. We will use D. Roberts of theDodgers is used as an example:list obnum namelist if obnum==8Q: Select a different player and state and interpret the confidence interval for thatplayer.• Open Safari on the screen.If the player you selected plays for the LA Dodgers, go to :http://mlb.mlb.com/stats/sortable player stats.jsp?c id=laIf the player you selected plays for the Anaheim Angels, go to :http://mlb.mlb.com/stats/sortable player stats.jsp?c id=anaClick on ”Historical Stats” located in the left-hand column towards the bottomof the page. Select the 2002 season and check your player’s batting average.Q: Was your player’s season batting average within the confidence interval you cre-ated earlier?Q: At the end of the 2002 season, how many players do we e xpect to have a battingaverage outside of the confidence intervals we just created? Why?3Stat 13 Lab4 AssignmentDue: Mar 1 2007AssignmentThe slugging percentage in baseball is defined as the average number of bases achieved perat bat. Generate this statistic for each player and perform analysis similar that used earlier, toanswer the following questions.Before doing the assignment, please clear the data we have changed in the lab and reload thedata again.clearuse http://www.stat.ucla.edu/labs/datasets/baseball.dta1. What is the highest slugging percentage in the data set? Who is the


View Full Document

UCLA STATS 13 - lab4

Documents in this Course
lab8

lab8

3 pages

lecture2

lecture2

78 pages

Lecture 3

Lecture 3

117 pages

lecture14

lecture14

113 pages

Lab 3

Lab 3

3 pages

Boost

Boost

101 pages

Noise

Noise

97 pages

lecture10

lecture10

10 pages

teach

teach

100 pages

ch11

ch11

8 pages

ch07

ch07

12 pages

ch04

ch04

10 pages

ch07

ch07

12 pages

ch03

ch03

5 pages

ch01

ch01

7 pages

ch10

ch10

7 pages

Lecture

Lecture

2 pages

ch06

ch06

11 pages

ch08

ch08

5 pages

ch11

ch11

9 pages

lecture16

lecture16

101 pages

ch01

ch01

7 pages

ch08

ch08

5 pages

lecture05

lecture05

13 pages

Load more
Download lab4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lab4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lab4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?