Describing Bivariate Relationships17.871Testing associations Continuous data Scatter plot (always use first!) (Pearson) correlation coefficient (should be rare) (Spearman) rank-order correlation coefficient (rare) Regression coefficient (common) Discrete data Cross tabulations Differences in means, box plots χ2 Gamma, Beta, etc.Continuous DV, continuous EV Example: What is the relationship between Bush’s vote (by county) in 2000 and in 2004?2004 Prez. Vote vs. 2000 Pres. Vote0.2 .4 .6 .81bushpct20040 .2 .4 .6 .8 1bushpct2000-.6 -.4 -.20.2 .4new2004-.6 -.4 -.2 0 .2 .4new2000Subtract each observation from its meanx’=x-0.588y’=y-0.609Covariance formulaCov x yx x y yni iin( , )( )( ) 1Cov(BushPct00,BushPct04) =0.014858-.6 -.4 -.20.2 .4new2004-.6 -.4 -.2 0 .2 .4new2000Correlation formulaCorr x yCov x yrx y( , )( , ) (compare with Tufte p. 102)-.6 -.4 -.20.2 .4new2004-.6 -.4 -.2 0 .2 .4new2000Corr(BushPct00,BushPct04) =0.96 =0 0148580 01499 0 0160596.. ..Warning: Don’t correlate often! Correlation only measures linear relationship Correlation is sensitive to variance Correlation usually doesn’t measure a theoretically interesting quantityRegression quantifies how one variable can bedescribed in terms of anotherThe Linear Relationship between Two VariablesiiiXY10The Linear Relationship between African American Population & Black Legislatorsbeobpop beo Fitted values0 10 20 300510359.031.110iiiXY10^^How did we get that line?1. Pick a value of Yibeobpop beo Fitted values0 10 20 300510YiiiiXY10How did we get that line?2. Decompose Yiinto two partsbeobpop beo Fitted values0 10 20 300510iiiXY10How did we get that line?3. Label the pointsbeobpop beo Fitted values0 10 20 300510YiYi^εiYi-Yi^iiiXY )(10“residual”What is εi? Vagueness of theory Poor proxies (i.e., measurement error) Wrong functional formThe Method of Least SquaresniiiniiiXYYY12102110)(or )ˆ(minimize to and Pick beobpop beo Fitted values0 10 20 300510YiYi^εiYi-Yi^iiiXY10iiiXY10niiiniiiXYYY12102110)(or )ˆ(minimize to and Pick niiiniiiXYYY12102110)(or )ˆ(minimize to and Pick Solve for 0)(11210niiiXY)var(),cov(or )())((1211XYXXXXXYYniiniii(Tufte,p. 68)^Regression Commands in STATA reg depvar expvars predict newvar predict newvar, resid newvar will now equal εiThe Linear Relationship between African American Population & Black Legislatorsbeobpop beo Fitted values0 10 20 300510359.031.110iiiXY10Black Elected Officials Example. reg beo bpopSource | SS df MS Number of obs = 41-------------+------------------------------ F( 1, 39) = 202.56Model | 351.26542 1 351.26542 Prob > F = 0.0000Residual | 67.6326195 39 1.73416973 R-squared = 0.8385-------------+------------------------------ Adj R-squared = 0.8344Total | 418.898039 40 10.472451 Root MSE = 1.3169------------------------------------------------------------------------------beo | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------bpop | .3584751 .0251876 14.23 0.000 .3075284 .4094219_cons | -1.314892 .3277508 -4.01 0.000 -1.977831 -.6519535------------------------------------------------------------------------------More regression examplesTemperature and LatitudePortlandORSanFranciscoCALosAngelesCAPhoenixAZNewYorkNYMiamiFLBostonMANorfolkVABaltimoreMDSyracuseNYMobileALWashingtonDCMemphisTNClevelandOHDallasTXHoustonTXKansasCityMOPittsburghPAMinneapolisMNDuluthMN020 4060 80JanTemp25 30 35 40 45latitudescatter JanTemp latitude, mlabel(city). reg jantemp latitudeSource | SS df MS Number of obs = 20-------------+------------------------------ F( 1, 18) = 49.34Model | 3250.72219 1 3250.72219 Prob > F = 0.0000Residual | 1185.82781 18 65.8793228 R-squared = 0.7327-------------+------------------------------ Adj R-squared = 0.7179Total | 4436.55 19 233.502632 Root MSE = 8.1166------------------------------------------------------------------------------jantemp | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------latitude | -2.341428 .3333232 -7.02 0.000 -3.041714 -1.641142_cons | 125.5072 12.77915 9.82 0.000 98.65921 152.3552------------------------------------------------------------------------------. predict py(option xb assumed; fitted values). predict ry, residgsort -ry. list city jantemp py ry+-------------------------------------------------+| city jantemp py ry ||-------------------------------------------------|1. | PortlandOR 40 17.8015 22.1985 |2. | SanFranciscoCA 49 36.53293 12.46707 |3. | LosAngelesCA 58 45.89864 12.10136 |4. | PhoenixAZ 54 48.24007 5.759929 |5. | NewYorkNY 32 29.50864 2.491357 ||-------------------------------------------------|6. | MiamiFL 67 64.63007 2.36993 |7. | BostonMA 29 27.16722 1.832785 |8. | NorfolkVA 39 38.87436 .125643 |9. | BaltimoreMD 32 34.1915 -2.1915 |10. | SyracuseNY 22 24.82579 -2.825786 ||-------------------------------------------------|11. | MobileAL 50 52.92293 -2.922928 |12. | WashingtonDC 31 34.1915 -3.1915 |13. | MemphisTN 40 43.55721 -3.557214 |14. | ClevelandOH 25 29.50864 -4.508643 |15. | DallasTX 43 48.24007 -5.240071 ||-------------------------------------------------|16. | HoustonTX 50 55.26435 -5.264356 |17. | KansasCityMO 28 34.1915 -6.1915 |18. | PittsburghPA 25 31.85007 -6.850072 |19. | MinneapolisMN 12 20.14293 -8.142929 |20. | DuluthMN 7 15.46007 -8.460073 |+-------------------------------------------------+Residualsei= Yi– B0– B1XiOne important numerical property of residuals The sum of
View Full Document