FSU CIS 5930r - Lecture 8 Multiregression - D632332

Home> Schools> Florida State University> Computer Science (CIS) > CIS 5930r> Lecture 8 Multiregression

DOC PREVIEW

FSU CIS 5930r - Lecture 8 Multiregression

School name Florida State University

Course Cis 5930r- Selected Topics in Computer Science (13).

Pages 31

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Multiple Linear RegressionSlide 2Basic Multiple Linear Regression FormulaA Multiple Linear Regression ModelLooks Like It’s Matrix Arithmetic TimeAnalysis of Multiple Linear RegressionExample of Multiple Linear RegressionSome Sample DataNow for Some Tedious Matrix ArithmeticX Matrix for ExampleTranspose to Get XTMultiply To Get XTXInvert to Get C=(XTX)-1Multiply to Get XTyMultiply (XTX)-1(XTy) to Get bHow Good Is This Regression Model?Calculating the ErrorsCalculating the Errors, ContinuedWhy Does It Stink?Calculating STDEV of Regression ParametersCalculating Confidence Intervals of STDEVsAnalysis of VarianceRunning an F-TestF-Test for Our ExampleMulticollinearityFinding MulticollinearityIs Multicollinearity a Problem in Our Example?Why Didn’t Regression Work Well Here?Rating vs. LengthRating vs. AgeWhite SlideMultiple Linear RegressionAndy WangCIS 5930-03Computer SystemsPerformance Analysis2Multiple Linear Regression•Models with more than one predictor variable•But each predictor variable has a linear relationship to the response variable•Conceptually, plotting a regression line in n-dimensional space, instead of 2-dimensional3Basic Multiple Linear Regression Formula•Response y is a function of k predictor variables x1,x2, . . . , xkexbxbxbbykk 221104A Multiple Linear Regression ModelGiven sample of n observationsmodel consists of n equations (note + vs. - typo in book):    nknnnkyxxxyxxx ,,,,,,,,,,21112111nknknnnkkkkexbxbxbbyexbxbxbbyexbxbxbby22110222221210211212111015Looks Like It’s Matrix Arithmetic Timey = Xb +e nkknnnkkneeebbbxxxxxxxxxyyy......1...............11...2110222221212111216Analysis ofMultiple Linear Regression•Listed in box 15.1 of Jain•Not terribly important (for our purposes) how they were derived–This isn’t a class on statistics•But you need to know how to use them•Mostly matrix analogs to simple linear regression results7Example ofMultiple Linear Regression•IMDB keeps numerical popularity ratings of movies•Postulate popularity of Academy Award winning films is based on two factors:–Year made–Running time•Produce a regressionrating = b0 + b1(year) +b2(length)8Some Sample DataTitle Year Length RatingSilence of the Lambs 1991 118 8.1Terms of Endearment 1983 132 6.8Rocky 1976 119 7.0Oliver! 1968 153 7.4Marty 1955 91 7.7Gentleman’s Agreement 1947 118 7.5Mutiny on the Bounty 1935 132 7.6It Happened One Night 1934 105 8.09Now for Some Tedious Matrix Arithmetic•We need to calculate X, XT, XTX, (XTX)-1, and XTy•Because•We will see thatb = (18.5430, -0.0051, -0.0086 )•Meaning the regression predicts:rating = 18.5430 – 0.0051*year – 0.0086*length yXXXbT1T10X Matrix for Example105193411321935111819471911955115319681119197611321983111819911X11Transpose to Get XT105132118911531191321181934193519471955196819761983199111111111TX12Multiply To Get XTX 119572189908396818990833077138515689968156898XXT13Invert to Get C=(XTX)-1  0004.00001.01328.00001.00003.0624001328.06240.07585.1207.1TXXC14Multiply to Get XTy572477.117840160..yXT15Multiply (XTX)-1(XTy)to Get b008600051.05430.18.b16How Good Is ThisRegression Model?•How accurately does the model predict the rating of a film based on its age and running time?•Best way to determine this analytically is to calculate the errorsor yXbyyTTTSSE 2ieSSE17Calculating the ErrorsEstimatedRating Year Length Rating ei ei^28.1 1991 118 7.4 -0.71 0.516.8 1983 132 7.3 0.51 0.267.0 1976 119 7.5 0.45 0.217.4 1968 153 7.2 -0.20 0.047.7 1955 91 7.8 0.10 0.017.5 1947 118 7.6 0.11 0.017.6 1935 132 7.6 -0.05 0.008.0 1934 105 7.8 -0.21 0.0418Calculating the Errors, Continued•So SSE = 1.08•SSY =•SS0 = •SST = SSY - SS0 = 452.9- 451.5 = 1.4•SSR = SST - SSE = .33• •In other words, this regression stinks 914522.yi54512.yn 23.41.133.2SSTSSRR19Why Does It Stink?•Let’s look at properties of the regression parameters•Now calculate standard deviations of the regression parameters46.508.13nSSEse20Calculating STDEVof Regression Parameters•Estimations only, since we’re working with a sample•Estimated stdev of 16.1676.120746.000 csbe0084.0003.46.111 csbe0097.0004.46.222 csbe21Calculating Confidence Intervals of STDEVs•We will use 90% level•Confidence intervals for •None is significant, at this level    10.51,02.1416.16015.254.180 b    012,.022.0084.015.2005.1 b    011,.028.0097.015.2009.2 b22Analysis of Variance•So, can we really say that none of the predictor variables are significant?–Not yet; predictors may be correlated•F-tests can be used for this purpose–E.g., to determine if the SSR is significantly higher than the SSE–Equivalent to testing that y does not depend on any of the predictor variables•Alternatively, that no bi is significantly nonzero23Running an F-Test•Need to calculate SSR and SSE•From those, calculate mean squares of regression (MSR) and errors (MSE)•MSR/MSE has an F distribution•If MSR/MSE > F-table, predictors explain a significant fraction of response variation•Note typos in book’s table 15.3–SSR has k degrees of freedom–SST matches not yy yyˆ24F-Test for Our Example•SSR = .33•SSE = 1.08•MSR = SSR/k = .33/2 = .16•MSE = SSE/(n-k-1) = 1.08/(8 - 2 - 1) = .22•F-computed = MSR/MSE = .76•F[90; 2,5] = 3.78 (at 90%)•So it fails the F-test at 90% (miserably)25Multicollinearity•If two predictor variables are linearly dependent, they are collinear–Meaning they are related–And thus second variable does not improve the regression–In fact, it can make it worse•Typical symptom is inconsistent results from various

View Full Document