STATISTICAL ANALYSIS Descriptive statistics Scatter diagrams Excel Linear relations Least squares line R2 statistic Statistical significance NBA PERFORMANCE 2011 DESCRIPTIVE STATISTICS Mean the average 1 w wi n Median the 50th percentile Standard deviation measure of variation Maximum Minumum w 1 2 wi w n All calculated by Excel in Data Analysis DESCRIPTIVE STATISTICS Mean Median Standard Deviation Minimum Maximum 0 500 0 506 0 161 0 207 0 756 LORENZ CURVES 1 00 1 00 0 90 0 90 equality NBA 0 80 0 70 0 80 equality 0 70 quarterbacks 0 60 0 60 0 50 0 50 0 40 0 40 0 30 0 30 0 20 0 20 0 10 0 10 0 00 0 00 0 20 0 40 0 60 0 80 Revenues by teams 0 00 0 00 1 00 0 20 0 40 0 60 0 80 Salaries by player 1 00 SCATTERPLOT SHOWS CORRELATION 250 200 New York Knicks revenue millions 150 100 50 0 0 000 winning percentage 0 200 0 400 0 600 0 800 1 000 MANY EXAMPLES team wins and team revenue weather forecasts and rainy days AIDS and circumcision strike outs and home runs baseball salaries and batting averages competitive balance and salary caps quarterback salaries and touchdown passes Which is cause Which effect FANS DEMAND SUCCESS This is my underlying hypothesis 250 200 revenue millions 150 100 50 0 0 000 winning percentage 0 200 0 400 0 600 0 800 1 000 But correlation is not causation CURVE FITTING A simple way to summarize a relation between two variables is a fitted curve Three questions 1 2 3 what kind of equation linear parabolic etc what particular equation is best parameter values how good is our best fit prediction reliability LINEAR EQUATIONS y y b0 b1x b1 slope rise run b0 b0 intercept x A slope gives the ratio of a change in y to a change in x PROFITS OR WINS 250 200 revenue millions 150 100 50 winning percentage 0 0 000 0 200 0 400 0 600 0 800 1 000 Leeds and von Allmen 3 1 hypothesize opposite TRENDLINES IN EXCEL Plot an XY Scatter chart add a trendline display equation 250 revenue intercept at 83 million 200 150 Homework Question 2 100 Demonstrate in Excel 50 0 0 000 Rev 82 9 98 2 Win percentage 0 200 0 400 0 600 0 800 1 000 THROUGH THE POINT OF MEANS 250 200 revenue millions 150 100 point of means 0 500 132 50 0 0 000 winning percentage 0 200 0 400 0 600 0 800 1 000 WHAT DOES THE SLOPE MEAN Example health survey of young men estimates the regression of height h on weight w h 62 inches 0 04 inches lb w If someone gains 20 lb Will they get taller by 8 inches A problem of interpretation COEFFICIENT OF DETERMINATION QUESTION 3 2 y i y R 2 yi y 2 Ratio of regression sum of squares to total sum of squares Ratio of predicted variance to observed variance Fraction of total variation accounted for by line WEAK GOODNESS OF FIT 250 200 revenue millions 150 100 50 Rev 82 9 98 2 Win 0 0 000 R 2 0 20 winning percentage 0 200 0 400 0 600 0 800 1 000 STATISTICAL SIGNIFICANCE How confident are we about the estimates Use the t statistic test if the t statistic is greater than 2 in absolute value then confident otherwise statistically insignificant STATISTICAL INFERENCE 250 200 revenue millions 150 100 50 Rev 82 9 98 2 Win 0 0 000 R 2 0 20 4 31 2 67 winning percentage 0 200 0 400 0 600 0 800 1 000 EXCEL OUTPUT SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0 451 0 203 0 175 31 767 30 ANOVA df Regression Residual Total intercept win SS 1 28 29 7206 6 28255 4 35462 0 Coefficients Standard Error 82 940 19 252 98 119 36 716 MS 7206 6 1009 1 t Stat 4 308 2 672 F 7 1 P value 0 000 0 012 MULTIPLE REGRESSION More than one cause What about market size Rev 64 1 100 5 win 3 23 population R 2 0 44 3 7 3 2 3 3 Much better fit Different interpretation EXCEL OUTPUT SUMMARY OUTPUT Regression Statistics Multiple R 0 661 R Square 0 437 Adjusted R Square 0 395 Standard Error 27 198 Observations 30 ANOVA df Regression Residual Total intercept win population SS 15489 2 19972 8 35462 0 MS 7744 6 739 7 Coefficients Standard Error 64 087 17 420 100 479 31 444 3 231 0 966 t Stat 3 679 3 196 3 346 2 27 29 F 10 5 P value 0 001 0 004 0 002 DUMMY VARIABLES More than two causes What about market power Dummy variables have only two values 1 if the team has a monopoly 0 if two teams in the same city Rev 14 2 96 0 win 5 87 population 43 5 monopoly 1 R 2 0 47 0 31 3 05 2 42 1 18 Estimate is positive but statistically insignificant EXCEL OUTPUT SUMMARY OUTPUT Regression Statistics Multiple R 0 682 R Square 0 466 Adjusted R Square 0 404 Standard Error 26 998 Observations 30 ANOVA df Regression Residual Total intercept win population monopoly 1 SS 16511 2 18950 8 35462 0 MS 5503 7 728 9 Coefficients Standard Error 14 184 45 554 96 009 31 440 5 868 2 424 43 520 36 755 t Stat 0 311 3 054 2 421 1 184 3 26 29 F 7 6 P value 0 758 0 005 0 023 0 247 THE END
View Full Document
Unlocking...