252mreg.doc 1/22/07 (Open this document in 'Outline' view!) Roger Even BoveJ. MULTIPLE REGRESSION1. Two explanatory variablesa. Modelb. Solution.c. Example2. Interpretation3. Standard errors4. Stepwise regressionAppendix to J1 – Derivation of the regression equations.252mreg.doc 1/22/07 (Open this document in 'Outline' view!) Roger Even BoveJ. MULTIPLE REGRESSION1. Two explanatory variablesa. ModelLet us assume that we have two independent variables, so that jY represents the jth observation on the dependent variable and ijX is the jth observation on independent variable i. For example, 15X is the 5th observation on independent variable 1 and 29X is the 9th observation on independent variable 2. We wish to estimate the coefficients 210 and ,, of the presumably 'true' regression line 22110XXY. Any actual point Yj may not be precisely on the regression line, so we write jjjjXXY22110 , where j is a random variable usually assumed to be ,0N, and is unknown but constant.The line that we estimate will have the equation 22110ˆXbXbbY . Our prediction ofY for any specific jX1 and jX2 will be jYˆ , and since jYˆ is unlikely to equal jY exactly, we call our error ( in estimating jY ) jjjYYeˆ so that jjjjeXbXbbY 22110 .b. Solution. After computing a set of six "spare parts" put them together in a set of Simplified Normal Equations 212122121111XXnXXbXnXbYXnYX 222222121122XnXbXXnXXbYXnYX and solve them as two equations in two unknowns for 21 and bb ; and, then get 0b by solving22110XbXbYb .c. ExampleRecall our original example. Y is the number of children actually born and Xis the number of children wanted. Add a new independent variable W, a dummy variable indicating the education of the wife. (1Wif she has finished college, 0Wif she has not.) In the above equations, 1XX and2XW . i Y X W 2X 2W XW XY WY 2Y 1 0 0 1 0 1 0 0 0 0 2 2 1 0 1 0 0 2 0 4 3 1 2 1 4 1 2 2 1 1 4 3 1 0 1 0 0 3 0 9 5 1 0 0 0 0 0 0 0 1 6 3 3 0 9 0 0 9 0 9 7 4 4 0 16 0 0 16 0 16 8 2 2 1 4 1 2 4 2 4 9 1 2 1 4 1 2 2 1 1 10 2 1 0 1 0 0 2 0 4sum 19 16 4 40 4 6 40 4 491Copy sums: ,10n,19Y,161XX ,42WX,40221XX ,4222WX ,621XWXX,401XYYX 42WYYX and .492YThe compute means: ,90.11019nYY60.110161nXXXand .40.01042nWWX Spare Parts: YSSSSTYnY 90.129.11049222 YXSYXnYX160.99.16.1104011 YXSYXnYX260.39.14.010422 140.1460.1104022121 XSSXnX 240.24.010422222 XSSXnX 2140.04.06.11062121XXSXXnXX Note that ,YSS 1XSS and 2XSS must be positive, while the other sums can be either positive or negative.Also note that 712101 kndf. (kis the number of independent variables.) SST is used later.Rewrite the Normal Equations to move the unknowns to the right. 221211212111bXXnXXbXnXYXnYX (Eqn. 1) 222221212122bXnXbXXnXXYXnYX (Eqn. 2)22110bXbXbY . (Eqn. 3)Or:212111bSbSSSXXXYX (Eqn. 1)212212bSSbSSXXXYX (Eqn. 2)22110bXbXbY . (Eqn. 3)If we fill in the above spare parts, we get: 3 Eqn.40.060.190.12 Eqn.40.240.060.31 Eqn.40.040.1460.92102121bbbbbbbWe solve the first two equations alone, by multiplying one of them so that the coefficients of 1b or 2b are of equal value. We then add or subtract the two equations to eliminate one of the variables. In this case, note that if we multiply equation 1 by 6, the coefficients of 2b in Equations 1 and 2 will be equal and opposite, so that, if we add them together, 2b will be eliminated.2 1212100.8600.542 Eqn.40.240.060.31 Eqn.640.240.8660.57bbbbb But if 00.54861b, then.62791.086541bNow, solve either Equation 1 or 2 for 2b. If we pick Equation 1, we can write it as.40.1460.940.012bb We can solve this for 2b by dividing through by 0.40, so that.0.360.2412bb If we substitute in ,62791.01bwe find that .3956.162791.00.360.242bFinally rearrange Equation 3 to read .4536.13956.140.062791.060.190.140.060.190.1210 bbb Now that we have values of all the coefficients, our regression equation, 22110ˆXbXbbY , becomes213956.16279.04536.1ˆXXY or eXXY 213956.16279.04536.1.2. Interpretation3. Standard errorsRecall that in the example in J190.1222YSSYnYSST and that we had computed Spare Parts: ,60.91YXS,60.32YXS,40.141XSS40.22XSS and .40.021XXSThe explained or regression sum of squares is YXYXSbSbYXnYXbYXnYXbSSR2121222111. The error or residual sum of squares is SSRSSTSSE . 2k is the number of independent variables.The coefficient of determination is 2222211121221YnYYXnYXbYXnYXbSSSbSbSSTSSRRYYXYX. An alternate formula, if spare parts are not available, is222221102YnYYnYXbYXbYbSSTSSRR. The standard error is 13122knSSEnRSSsYe32121nSbSbSSYXYXY Or 3222111222nYXnYXbYXnYXbYnYse An alternate formula, if spare parts are not available, is32211022nYXbYXbYbYse.3If we wish the coefficient of determination in the example in J1, recall that 60.33956.160.96279.02121222111YXYXSbSbYXnYXbYXnYXbSSR0520.1102416.502784.6 8567.90.120520.112121222221112SSTSSRSSSbSbYnYYXnYXbYXnYXbRYYXYX. This represents a considerable improvement over the simple regression. SSRSSTSSE 848.10520.1190.122121YXYXYSbSbSS
View Full Document