UW-Madison SOC 357 - Introduction to Regression - D2387276

Home> Schools> University of Wisconsin, Madison> Sociology (SOC) > SOC 357> Introduction to Regression

DOC PREVIEW

UW-Madison SOC 357 - Introduction to Regression

School name University of Wisconsin, Madison

Course Soc 357- Methods of Sociological Inquiry

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Class 25Introduction to RegressionRegression: Concepts• Regression expresses dependent variableY as a function of independent variable X:Y = f(X) + ε• In essence, regression is conditional mean(i.e., expected value of Y for a given valueof X)• f(X) can be linear, e.g., f(X) = α + βX• f(X) can be nonlinear, e.g.,f(X) = α + βX + γX2.Regression: Concepts• Consider the simplest form of regression:Y = α + βX + εor with subscript, Yi = α + βXi + εiεi, the error term, is the difference betweenobserved Yi and the expected value of Ygiven Xi. On average, εi = 0.2Regression: Interpretation• Y = α + βX + εα: intercept. The mean value of Y when Xis 0. Note α may not be a meaningfulvalue (e.g., X is height, Y is hat size).β: slope, the “effect” of X on Y.• Interpretation: One unit change in X isaccompanied by β units of change inY on average.MPG and Vehicle WeightY= a + bX + empg = 39.4 - 0.006 * weight + eintercept = 39.4slope = -.006Regression as the Best-Fitting Linee: Difference between predicted andobserved value.Find the best-fitting line by the criterionof Ordinary Least Squares, that is,minimize sum of square of e.3Uses of Regression• Descriptive -- To summarizecorrelation between variables.• Explanatory – to describe causalrelationship between variables.• Predictive -- Can be used to predictthe value of Y based on the value ofX.Common Errors In Regression• Don’t fit a straight line to a nonlinearrelationship.• Beware of outliers.• Don’t extrapolate beyond the data.(Interpolation is usually okay.)• Don’t infer that x causes y justbecause there is a good linear modelfor their relationship.Are Women Catching up withMen in Sports?4Sex and Sports• Why is it incorrect to extrapolate fromhistorical data of world records andconclude that women will eventuallysurpass men in running andswimming?• What factors affect “best times” inraces?• Why is it not a good idea to studyperformance trends by examiningwinning times (or world records)?Multiple regression analysis• Regression with 2+ independentvariables• Y = α + βX1 + γX2 + ε• β is interpreted as the effect of X1 onY, controlling for X2. Same for γ.5Non-linear RegressionAnalysis• The simplest form of regression fits aline to a continuous dependentvariable.• Regression can also fit a curve, e.g.– Y = α + βX + γX2 + ε– lnY = α + βX + ε• Regression can also predictcategorical dependent variables (e.g.logistic regression).Examples Using StataPredicting Verbal ScoreWhat is the Relationship Between Education and Vocabulary? Dependent variable: wordsum (verbal score) Independent variables: education (years of schooling), sex . graph bar wordsum, over(educ) 0 2 4 6 8mean of wordsum0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 . regress wordsum educ Source | SS df MS Number of obs = 20200 -------------+------------------------------ F( 1, 20198) = 6832.24 Model | 24085.9215 1 24085.9215 Prob > F = 0.0000 Residual | 71204.7025 20198 3.52533431 R-squared = 0.2528 -------------+------------------------------ Adj R-squared = 0.2527 Total | 95290.624 20199 4.71759117 Root MSE = 1.8776 ------------------------------------------------------------------------------ wordsum | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .3589895 .0043431 82.66 0.000 .3504766 .3675023 _cons | 1.411986 .056832 24.84 0.000 1.300591 1.523381 ------------------------------------------------------------------------------ • Regression equation: wordsum = 1.412 + 0.359 * educ + e • Interpretation: One year of education increases vocabulary score by 0.359 on average. • Prediction: The average score for respondents with no schooling is 1.412, and that for respondents with 1 year’s education is 1.412+0.359, so on and so forth.t Statistic and p-Value • The S.E. of the coefficient 0.359 is 0.004. S.E. indicates how variable the estimate of a coefficient is. • t = coefficient/S.E. The larger the t, the more robust the result. • P-value is the probability of the observed relationship between the IV and the DV in the sample given that there is no such association in the population. A p-value smaller than 0.05 indicates a statistically significant relationship. Statistical and Substantive Significance • Substantive significance – estimated coefficient is strong, important and meaningful. • Statistical significance – p-value is smaller than < 0.05, which means that the observed association between the IV and the DV is not due to sampling error. The larger the sample is, the more likely we will find statistical significance. . regress wordsum educ sex Source | SS df MS Number of obs = 20200 -------------+------------------------------ F( 2, 20197) = 3459.08 Model | 24312.4252 2 12156.2126 Prob > F = 0.0000 Residual | 70978.1987 20197 3.51429414 R-squared = 0.2551 -------------+------------------------------ Adj R-squared = 0.2551 Total | 95290.624 20199 4.71759117 Root MSE = 1.8746 ------------------------------------------------------------------------------ wordsum | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .3610817 .0043441 83.12 0.000 .3525669 .3695965 sex | .2142668 .0266892 8.03 0.000 .1619537 .2665799 _cons | 1.048991 .0725545 14.46 0.000 .9067779 1.191203 ------------------------------------------------------------------------------ • Regression equation: wordsum = 1.049 + 0.361 * educ + 0.214 * sex + e (sex is coded 1=M, 2=F) • Interpretation: Holding sex constant, one year of education increases vocabulary score by 0.361 on average. Holding education constant, women on average score 0.214 higher on the vocabulary test than men do. • Prediction: The average score for women with 6 years of schooling is 1.049 + 0.361 * 6 + 0.214 * 2,

View Full Document