Blake Zahari Nadin Mustafa The Effects of Climate in Major League Baseball Baseball has been pegged as a seasonal sport since its invention in the late 18th century The MLB regular season is 162 games long and runs from April through September How does each team s respective city s climate affect their performance on the field We wanted to know how if at all climate affected teams in Major League Baseball We hypothesized that generally the warmer the climate the more wins the team has We gathered data from the internet concerning all 30 teams in Major League Baseball including each team s mean batting average in 2007 wins in 2007 wins in 2006 and the mean temperature of each team s home city over the past 30 years shown below 297 285 279 277 275 273 272 272 269 267 266 265 264 264 262 262 258 256 252 252 249 248 247 246 243 241 235 233 232 227 94 96 83 78 82 90 94 69 88 85 84 71 88 89 79 88 75 66 72 69 89 68 90 72 83 76 71 73 96 73 89 86 87 83 88 76 97 62 97 66 79 78 78 88 96 95 80 61 80 70 85 67 76 90 75 93 76 82 78 71 63 0 51 3 45 7 56 6 63 0 72 6 47 4 53 6 47 4 49 0 61 3 75 9 52 8 64 2 44 9 48 6 65 5 72 3 51 7 55 1 54 3 50 3 50 3 49 0 46 1 59 5 57 1 67 9 49 6 57 5 Los Angeles Angels Boston Red Sox Toronto Blue Jays St Louis Cardinals Los Angeles Dodgers Arizona Diamondbacks New York Yankees Kansas City Royals New York Mets Chicago Cubs Atlanta Braves Florida Marlins Seattle Mariners San Diego Padres Minnesota Twins Detroit Tigers Texas Rangers Tampa Bay Rays Cincinnati Reds Baltimore Orioles Philadelphia Phillies Pittsburgh Pirates Colorado Rockies Chicago White Sox Milwaukee Brewers Oakland Athletics San Francisco Giants Houston Astros Cleveland Indians Washington Nationals Using SAS we produced scatter plots of the data using temperature as the explanatory variable and wins as the response variable We started with 2007 shown below with a best fitting line It seems like the points are all over the place and that there is absolutely no relationship between climate and wins In fact the best fitting line would suggest that a colder climate results in more wins However this alone does not provide enough to determine if there is a relationship so this is merely an inference w i n s 0 7 9 6 9 8 8 0 2 8 4 3 t e mp 100 N 30 Rs q 0 0701 95 Adj Rs q 0 0369 R MS E 9 121 90 85 80 75 70 65 40 45 50 55 60 t e mp 65 70 75 80 Below is the Corr Procedure output in SAS Displayed at the bottom is the correlation coefficient or r value which will help determine how significantly our two variables have a linear relationship To be considered a strong relationship the r value would have to be close to either 1 or 1 Here our r value is 26476 much closer to 0 than either 1 or 1 meaning a very weak linear relationship It is enough to say that the two variables don t seem to affect each other at all Our earlier inference is backed up The CORR Procedure 2 Variables wins07 temp Simple Statistics Variable N Mean Std Dev wins07 temp 30 30 81 03333 56 11667 Sum 9 29399 8 65492 2431 1684 Minimum 66 00000 44 90000 Maximum 96 00000 75 90000 Pearson Correlation Coefficients N 30 Prob r under H0 Rho 0 wins07 wins07 temp temp 1 00000 0 26476 0 1574 0 26476 0 1574 1 00000 Further we looked at the residuals to see just how far each actual value strayed its predicted value this again backed up our inference The REG Procedure Model MODEL1 Dependent Variable wins07 Output Statistics Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Dependent Variable 94 0000 96 0000 83 0000 78 0000 82 0000 90 0000 94 0000 69 0000 88 0000 85 0000 84 0000 71 0000 88 0000 89 0000 79 0000 88 0000 75 0000 66 0000 72 0000 69 0000 89 0000 68 0000 90 0000 72 0000 83 0000 76 0000 71 0000 73 0000 96 0000 73 0000 Predicted Value Residual 79 0763 82 4028 83 9949 80 8959 79 0763 76 3469 83 5116 81 7489 83 5116 83 0567 79 5597 75 4087 81 9763 78 7351 84 2224 83 1704 78 3655 76 4322 82 2890 81 3224 81 5498 82 6871 82 6871 83 0567 83 8812 80 0714 80 7538 77 6832 82 8861 80 6400 Sum of Residuals Sum of Squared Residuals Predicted Residual SS PRESS 14 9237 13 5972 0 9949 2 8959 2 9237 13 6531 10 4884 12 7489 4 4884 1 9433 4 4403 4 4087 6 0237 10 2649 5 2224 4 8296 3 3655 10 4322 10 2890 12 3224 7 4502 14 6871 7 3129 11 0567 0 8812 4 0714 9 7538 4 6832 13 1139 7 6400 0 2329 37097 2675 48373 Even though the evidence looks conclusive more years are needed to add strength to any conclusion found So we used SAS to produce scatter plots for 2006 and 2005 shown next Plot of wins06 temp Legend A 1 obs B 2 obs etc wins06 97 B 96 A 95 A 94 93 A 92 91 90 A 89 A 88 A A 87 A 86 A 85 A 84 83 A 82 A 81 80 A A 79 A 78 A A A 77 76 A A A 75 A 74 73 72 71 A 70 A 69 68 67 A 66 A 65 64 63 62 A 61 A 45 50 55 60 65 70 75 80 temp Plot of wins05 temp Legend A 1 obs B 2 obs etc wins05 100 A A A A A A 90 A A A A A A A A A A 80 A A A A A A A A A 70 A B A 60 A 50 45 50 55 60 65 70 75 80 temp The Corr procedure for wins in 2006 and 2005 versus temperature also produce weak rvalues shown below The CORR Procedure 2 Variables wins06 temp Simple Statistics Variable N Mean Std Dev wins06 temp 30 30 80 96667 56 11667 Sum 10 08407 8 65492 Minimum 2429 1684 61 00000 44 90000 Maximum 97 00000 75 90000 Pearson Correlation Coefficients N 30 Prob r under H0 Rho 0 wins06 wins06 temp temp 1 00000 0 24922 0 1841 0 24922 0 1841 1 00000 The CORR Procedure 2 Variables wins05 temp Simple Statistics Variable N Mean Std Dev wins05 …
View Full Document