DOC PREVIEW
Duke STA 101 - Multiple Regression

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple RegressionMultiple regressionMotivating example: Sex discrimination in wagesVariables collectedComparison for male and femalesRelationships of bsal with other variablesMultiple regression modelOutput from regression (fsex = 1 for females, = 0 for males)PredictionsInterpretation of coefficients in multiple regressionWhich variable is the strongest predictor of the outcome?Hypothesis tests for coefficientsSlide 13CIs for regression coefficientsWarning about tests and CIsChecking assumptionsPlot of residuals versus predicted valuesPlots of residuals vs. predictorsSummary of residual plotsModeling categorical predictorsCollinearityGeneral warnings for multiple regressionMultiple RegressionMultiple regressionTypically, we want to use more than a single predictor (independent variable) to make predictionsRegression with more than one predictor is called “multiple regression” € yi= α + β1x1i+ β2x2i+K + βpxpi+ εiMotivating example: Sex discrimination in wagesIn 1970’s, Harris Trust and Savings Bank was sued for discrimination on the basis of sex.Analysis of salaries of employees of one type (skilled, entry-level clerical) presented as evidence by the defense.Did female employees tend to receive lower starting salaries than similarly qualified and experienced male employees?Variables collected 93 employees on data file (61 female, 32 male).bsal: Annual salary at time of hire.sal77 : Annual salary in 1977.educ: years of education.exper: months previous work prior to hire at bank.fsex: 1 if female, 0 if malesenior: months worked at bank since hiredage: monthsSo we have six x’s and and one y (bsal). However, in what follows we won’t use sal77.Comparison for male and femalesThis shows men started at higher salaries than women (t=6.3, p<.0001).But, it doesn’t control for other characteristics.bsal40005000600070008000Female MalefsexOneway Analysis of bsal By fsexRelationships of bsal with other variablesSenior and education predict bsal well. We want to control for them when judging gender effect.40005000600070008000bsal60 65 70 75 80 85 90 95 100seniorLinear FitBivariate Fit of bsal By senior40005000600070008000bsal300 400 500 600 700 800ageLinear FitBivariate Fit of bsal By age40005000600070008000bsal7 8 9 10 11 12 13 14 15 16 17educLinear FitBivariate Fit of bsal By educ40005000600070008000bsal-50 0 50 100 150 200 250 300 350 400experLinear FitBivariate Fit of bsal By experFit Y by X GroupMultiple regression modelFor any combination of values of the predictor variables, the average value of the response (bsal) lies on a straight line: Just like in simple regression, assume that ε follows a normal curve within any combination of predictors.€ bsali= α + β1fsexi+ β2seniori+ β3agei+ β4educi+ β5experi+ εiOutput from regression (fsex = 1 for females, = 0 for males)Term Estimate Std Error t Ratio Prob>|t| Int. 6277.9 652 9.62 <.0001Fsex -767.9 128.9 -5.95 <.0001Senior -22.6 5.3 -4.26 <.0001Age 0.63 .72 .88 .3837Educ 92.3 24.8 3.71 .0004Exper 0.50 1.05 .47 .636440005000600070008000b s a l A c tu a l4000 5000 6000 7000 8000bsal Predicted P<.0001 RSq=0.52RMSE=508.09Actual by Predicted PlotRSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)0.5151560.487291508.09065420.323 93Summary of FitModelErrorC. TotalSource 5 87 92DF 23863715 22459575 46323290Sum of Squares 4772743 258156Mean Square 18.4878F Ratio <.0001Prob > FAnalysis of VarianceInterceptfsexseniorageeducexperTerm6277.8934-767.9127 -22.58230.630960392.3060230.5006397Estimate652.2713 128.975.2957320.72065424.863541.055262Std Error 9.62 -5.95 -4.26 0.88 3.71 0.47t Ratio<.0001<.0001<.00010.38370.00040.6364Prob>|t|Parameter EstimatesfsexseniorageeducexperSource 1 1 1 1 1Nparm 1 1 1 1 1DF 9152264.3 4694256.3 197894.0 3558085.8 58104.8Sum of Squares 35.4525 18.1838 0.7666 13.7827 0.2251F Ratio <.0001 <.0001 0.3837 0.0004 0.6364Prob > FEffect Tests-1000-500050010001500b s a l R e s id u a l4000 5000 6000 7000 8000bsal PredictedResidual by Predicted PlotWhole Model age educ experResponse bsalPredictionsExample: Prediction of beginning wages for a woman with 10 months seniority, that is 25 years old, with 12 years of education, and two years of experience:Pred. bsal = 6277.9 - 767.9*1 - 22.6*10 + .63*300 + 92.3*12 + .50*24 = 6592.6€ bsali= α + β1fsexi+ β2seniori+ β3agei+ β4educi+ β5experi+ εiInterpretation of coefficients in multiple regressionEach estimated coefficient is amount Y is expected to increase when the value of its corresponding predictor is increased by one, holding constant the values of the other predictors.Example: estimated coefficient of education equals 92.3. For each additional year of education of employee, we expect salary to increase by about 92 dollars, holding all other variables constant.Estimated coefficient of fsex equals -767.For employees who started at the same time, had the same education and experience, and were the same age, women earned $767 less on average than men.Which variable is the strongest predictor of the outcome?The coefficient that has the strongest linear association with the outcome variable is the one with the largest absolute value of T, which equals the coefficient over its SE.It is not size of coefficient. This is sensitive to scales of predictors. The T statistic is not, since it is a standardized measure.Example: In wages regression, seniority is a better predictor than education because it has a larger T.Hypothesis tests for coefficientsThe reported t-stats (coef. / SE) and p-values are used to test whether a particular coefficient equals 0, given that all other coefficients are in the model.Examples: 1) Test whether coefficient of education equals zero has p-value = .0004. Hence, reject the null hypothesis; it appears that education is a useful predictor of bsal when all the other predictors are in the model.2) Test whether coefficient of experience equals zero has p-value = .6364. Hence, we cannot reject the null hypothesis; it appears that experience is not a particularly useful predictor of bsal when all other predictors are in the model.Hypothesis tests for coefficientsThe test statistics have the usual form (observed – expected)/SE.For p-value, use


View Full Document

Duke STA 101 - Multiple Regression

Download Multiple Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?