Baysian Statistics ProjectBy: Nick Luerkens and Josh GundersonResearch Question: Which Statistic in major league baseball has more of an impact on a team’s success, batting average or era? Variables: We obtained data from all 30 major league baseball teams during the 2004 major league season. - Predictor variables: batting average (x) and era (y)- Response variable: season winning percentage (z)Baysian Model: We specified our model by assuming normal distributions for all three ofour variables. We will analyze our data using multiple linear regression: mu (z) = alpha + beta (x) + gamma (y). Winbugs Code:model{for (i in 1:N) { wins[i] ~ dnorm( mu[i], tau ) ; mu[i] <- alpha + beta * batting[i] + gamma * era[i] ;}tau ~ dgamma(.01, .01) ;alpha ~ dnorm (0 , .01) ;beta ~ dnorm ( .954, .01) ;gamma ~ dnorm (-.057, .01) ;}list(N = 30) Prior parameters:* We must guess what E (x) and E (y) will be non-informatively. E (x) = 0.262 and E (y) = 4.39. - win%: mu [i] for win% will be 0.5. The reason is because we are assuming the average team to win half of their games.- alpha: alpha indicates the intercept for our multiple regression equation. We will set it at zero.- Beta and Gamma: 0 .5 = alpha + Beta (.262) + -Gamma (4.39). Assume that average and era have the same impact on wins. .25 = Beta (.262); Beta = .954.25 = -Gamma (4.39); Gamma = -.057 (This needs to be negative because the slope of era vs. wins is negative in simple regression). Winbugs Output and Interpretation:- The 1st is a scatter plot of era (x – axis) vs. winning % (y – axis). - The 2nd is a plot of batting average (x – axis) vs. winning % (y – axis). s catterplot 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.4 0.5 0.6 0.7s catterplot 3.5 4.0 4.5 5.0 5.5 6.0 0.3 0.4 0.5 0.6 0.7Bivariatepo s teriorscatterplotsgamma -0.3 -0.1beta -5.0 0.0 5.0 10.0Times eriesTimes eriesbeta chains 1:3iteration1001 5000 10000 15000 -5.0 0.0 5.0 10.0gamma chains 1:3iteration1001 5000 10000 15000 -0.3 -0.2 -0.12.77556E-17 0.1[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]box plot: mu 0.3 0.4 0.5 0.6 0.7Top: batting average (boxes) vs. winning percentage (y - axis) Bottom: era (boxes) vs. winning % (y - axis)[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]box plot: mu 0.3 0.4 0.5 0.6 0.7Node statistics node mean sd MC error median start samplemu[1] 0.5844 0.0216 2.508E-4 0.5845 17001 9000mu[2] 0.3887 0.02172 2.225E-4 0.3888 17001 9000mu[3] 0.5901 0.0208 2.129E-4 0.5898 17001 9000mu[4] 0.5363 0.02211 2.542E-4 0.5364 17001 9000mu[5] 0.5949 0.02197 2.534E-4 0.595 17001 9000mu[6] 0.5743 0.01924 1.952E-4 0.574 17001 9000mu[7] 0.4594 0.01609 1.679E-4 0.4595 17001 9000mu[8] 0.3541 0.02675 2.753E-4 0.354 17001 9000mu[9] 0.5037 0.01893 2.121E-4 0.5038 17001 9000mu[10] 0.4232 0.03184 3.331E-4 0.4232 17001 9000mu[11] 0.4742 0.01809 1.948E-4 0.4743 17001 9000mu[12] 0.5271 0.01447 1.448E-4 0.5271 17001 9000mu[13] 0.545 0.01483 1.512E-4 0.545 17001 9000mu[14] 0.3963 0.02115 2.122E-4 0.3965 17001 9000mu[15] 0.5281 0.01665 1.659E-4 0.5281 17001 9000mu[16] 0.4449 0.02493 2.628E-4 0.445 17001 9000mu[17] 0.5429 0.01522 1.538E-4 0.5429 17001 9000mu[18] 0.4397 0.0233 2.451E-4 0.4397 17001 9000mu[19] 0.4648 0.02537 2.653E-4 0.4648 17001 9000mu[20] 0.4824 0.01281 1.357E-4 0.4826 17001 9000mu[21] 0.5452 0.01358 1.449E-4 0.5451 17001 9000mu[22] 0.5032 0.01111 1.165E-4 0.5032 17001 9000mu[23] 0.4904 0.01399 1.397E-4 0.4904 17001 9000mu[24] 0.5725 0.01661 1.796E-4 0.5724 17001 9000mu[25] 0.5326 0.01246 1.35E-4 0.5326 17001 9000mu[26] 0.4835 0.01443 1.555E-4 0.4837 17001 9000mu[27] 0.6229 0.0236 2.551E-4 0.6226 17001 9000mu[28] 0.4276 0.01632 1.643E-4 0.4278 17001 9000mu[29] 0.4907 0.0112 1.16E-4 0.4907 17001 9000mu[30] 0.4256 0.01666 1.671E-4 0.4258 17001 9000Totals: E[mu] 0.4936Conclusion: Proof that both predictor variables have a lot of impact on our response variable is clear through our output. There was convergence. Determining which variable has more of an impact is a not as clear. Here is what we are going to do. First, take individual node statistics for our slopes beta and gamma: Node statistics node mean sd MC error median start samplebeta 4.221 1.163 0.005086 4.23 1001 57000Node statistics node mean sd MC error median start samplegamma -0.1043 0.02434 9.471E-5 -0.1043 1001 57000 • To determine which variable has more impact, we will take both means by the computed standard deviation of our actual dataset computed by SAS. • For any given value for ERA, for each standard deviation increase in Batting Average, Win % increases 4.221 (.01) = .04221; (.01 is standard dev. From SAS)• For any given value for batting average, for each standard deviation increase in ERA, Win % increases .1043 ( .466) = .0486; (.466 is standard dev. From SAS)• In conclusion, a one standard deviation increase in ERA will have more of an impact on Win % than a one standard deviation increase in batting average by .00639. This interpreted into a 162 games season is approx. 1.04
View Full Document