Unformatted text preview:

Baysian Statistics ProjectBy: Nick Luerkens and Josh GundersonResearch Question: Which Statistic in major league baseball has more of an impact on a team’s success, batting average or era? Variables: We obtained data from all 30 major league baseball teams during the 2004 major league season. - Predictor variables: batting average (x) and era (y)- Response variable: season winning percentage (z)Baysian Model: We specified our model by assuming normal distributions for all three ofour variables. We will analyze our data using multiple linear regression: mu (z) = alpha + beta (x) + gamma (y). Winbugs Code:model{for (i in 1:N) { wins[i] ~ dnorm( mu[i], tau ) ; mu[i] <- alpha + beta * batting[i] + gamma * era[i] ;}tau ~ dgamma(.01, .01) ;alpha ~ dnorm (0 , .01) ;beta ~ dnorm ( .954, .01) ;gamma ~ dnorm (-.057, .01) ;}list(N = 30) Prior parameters:* We must guess what E (x) and E (y) will be non-informatively. E (x) = 0.262 and E (y) = 4.39. - win%: mu [i] for win% will be 0.5. The reason is because we are assuming the average team to win half of their games.- alpha: alpha indicates the intercept for our multiple regression equation. We will set it at zero.- Beta and Gamma: 0 .5 = alpha + Beta (.262) + -Gamma (4.39). Assume that average and era have the same impact on wins. .25 = Beta (.262); Beta = .954.25 = -Gamma (4.39); Gamma = -.057 (This needs to be negative because the slope of era vs. wins is negative in simple regression). Winbugs Output and Interpretation:- The 1st is a scatter plot of era (x – axis) vs. winning % (y – axis). - The 2nd is a plot of batting average (x – axis) vs. winning % (y – axis). s catterplot 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.4 0.5 0.6 0.7s catterplot 3.5 4.0 4.5 5.0 5.5 6.0 0.3 0.4 0.5 0.6 0.7Bivariatepo s teriorscatterplotsgamma -0.3 -0.1beta -5.0 0.0 5.0 10.0Times eriesTimes eriesbeta chains 1:3iteration1001 5000 10000 15000 -5.0 0.0 5.0 10.0gamma chains 1:3iteration1001 5000 10000 15000 -0.3 -0.2 -0.12.77556E-17 0.1[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]box plot: mu 0.3 0.4 0.5 0.6 0.7Top: batting average (boxes) vs. winning percentage (y - axis) Bottom: era (boxes) vs. winning % (y - axis)[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]box plot: mu 0.3 0.4 0.5 0.6 0.7Node statistics node mean sd MC error median start samplemu[1] 0.5844 0.0216 2.508E-4 0.5845 17001 9000mu[2] 0.3887 0.02172 2.225E-4 0.3888 17001 9000mu[3] 0.5901 0.0208 2.129E-4 0.5898 17001 9000mu[4] 0.5363 0.02211 2.542E-4 0.5364 17001 9000mu[5] 0.5949 0.02197 2.534E-4 0.595 17001 9000mu[6] 0.5743 0.01924 1.952E-4 0.574 17001 9000mu[7] 0.4594 0.01609 1.679E-4 0.4595 17001 9000mu[8] 0.3541 0.02675 2.753E-4 0.354 17001 9000mu[9] 0.5037 0.01893 2.121E-4 0.5038 17001 9000mu[10] 0.4232 0.03184 3.331E-4 0.4232 17001 9000mu[11] 0.4742 0.01809 1.948E-4 0.4743 17001 9000mu[12] 0.5271 0.01447 1.448E-4 0.5271 17001 9000mu[13] 0.545 0.01483 1.512E-4 0.545 17001 9000mu[14] 0.3963 0.02115 2.122E-4 0.3965 17001 9000mu[15] 0.5281 0.01665 1.659E-4 0.5281 17001 9000mu[16] 0.4449 0.02493 2.628E-4 0.445 17001 9000mu[17] 0.5429 0.01522 1.538E-4 0.5429 17001 9000mu[18] 0.4397 0.0233 2.451E-4 0.4397 17001 9000mu[19] 0.4648 0.02537 2.653E-4 0.4648 17001 9000mu[20] 0.4824 0.01281 1.357E-4 0.4826 17001 9000mu[21] 0.5452 0.01358 1.449E-4 0.5451 17001 9000mu[22] 0.5032 0.01111 1.165E-4 0.5032 17001 9000mu[23] 0.4904 0.01399 1.397E-4 0.4904 17001 9000mu[24] 0.5725 0.01661 1.796E-4 0.5724 17001 9000mu[25] 0.5326 0.01246 1.35E-4 0.5326 17001 9000mu[26] 0.4835 0.01443 1.555E-4 0.4837 17001 9000mu[27] 0.6229 0.0236 2.551E-4 0.6226 17001 9000mu[28] 0.4276 0.01632 1.643E-4 0.4278 17001 9000mu[29] 0.4907 0.0112 1.16E-4 0.4907 17001 9000mu[30] 0.4256 0.01666 1.671E-4 0.4258 17001 9000Totals: E[mu] 0.4936Conclusion: Proof that both predictor variables have a lot of impact on our response variable is clear through our output. There was convergence. Determining which variable has more of an impact is a not as clear. Here is what we are going to do. First, take individual node statistics for our slopes beta and gamma: Node statistics node mean sd MC error median start samplebeta 4.221 1.163 0.005086 4.23 1001 57000Node statistics node mean sd MC error median start samplegamma -0.1043 0.02434 9.471E-5 -0.1043 1001 57000 • To determine which variable has more impact, we will take both means by the computed standard deviation of our actual dataset computed by SAS. • For any given value for ERA, for each standard deviation increase in Batting Average, Win % increases 4.221 (.01) = .04221; (.01 is standard dev. From SAS)• For any given value for batting average, for each standard deviation increase in ERA, Win % increases .1043 ( .466) = .0486; (.466 is standard dev. From SAS)• In conclusion, a one standard deviation increase in ERA will have more of an impact on Win % than a one standard deviation increase in batting average by .00639. This interpreted into a 162 games season is approx. 1.04


View Full Document

UI STAT 4520 - Bayesian Statistics

Documents in this Course
Load more
Download Bayesian Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?