DOC PREVIEW
CMU CS 10701 - Matrix MLE for Linear Regression

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Matrix MLE for Linear RegressionJoseph E. GonzalezSome people have had some trouble with the linear algebra form of the MLE for multiple regression. I tried to find a niceonline derivation but I could not find anything helpful. So I have decide to derive the matrix form for the MLE weights forlinear regression under the assumption of Gaussian noise. The ModelLets say we are given some set of data X and y. The matrix X has n rows corresponding to each of the examples and dcolumns corresponding to each of the d features. The column vector y consists has n rows corresponding to each of theexamples and 1 column. We want to "learn" the relationship between an individual feature vector x and an outcome y. Insome sense we want to learn the function f : dØ  which satisfies:(1)y = f HxL„ Linear ModelsThere are many functions f that we could chose from (I am sure you have some favorites). To simplify our computationand to impose some assumptions (which often aids in generalization) we will restrict f to the class of linear functions. Thatis for a choice of weights w we can express f as:(2)fwHxL =‚j=1dwjxj„ Nonlinear FeaturesOften people find this assumption to restrictive. We can permit a more complex class of functions by creating new(nonlinear) features from the original features xj. For example:(3)fwHxL =‚j=1dwjxj+‚j=d+12 dwjSinAxj2ETo formalize this notion we can rewrite equation 3 as:(4)fwHxL =‚j=1mwjfj@xDReturning to the example in equation 3 we can use the notation of equation 4 by defining:fj@xD =xjif 1 §j§ dSinAxj2E if d + 1 § j § 2 d0 otherwiseThis technique allows us to lift our simple linear function fw into a more complex space permitting a richer class ofPrinted byMathematica for Studentsfunctions in our original space d. With this transformation we can define a matrix F which is like X but consists of thetransformed features. If we do not want to transform our features then we simply define:(5)fj@xD =xjif 1 §j§ d0 otherwiseThe matrix F is constructed by:(6)F=f1@X11,…, X1 dD … fm@X11,…, X1 dD... ... ...f1@Xn1,…, XndD … fm@Xn1,…, XndDIf we use the trivial transform in equation 5 equation 6 becomes:(7)F=X11… X1 d... ... ...Xn1… Xnd= XFor the rest of these notes I will use the trivial feature space X. However feel free to substitute F where ever X is used if anonlinear feature space is desired.„ NoiseSadly we live in the real world where there is random noise e that gets mixed into our observations. So a more naturalmodel would be of the form:(8)y = fwHxL +eWe have to pick what type of noise we expect to observe. A common choice is 0 mean independent gaussian noise of theform:e~NH0, sL„ Which fwHaving selected the feature transformation f and having decided to use a linear model we have reduced our hypothesisspace (the space of functions we are willing to consider for f ) from all the functions (and then some) to linear functions inthe feature space determined by f. The functions in this space are indexed by w (the weight vector). How do we pick ffrom this reduced hypothesis space? We simply choose the "best" w. For the remainder of these notes we will be describinghow to choose the w that maximizes the likelihood of our data X and y.Matrix NotationLets begin with some linear algebra. We can apply our model to the data in the following ways:(9)y =y1…yn=fwH < X11,…,X1 d>L +e1…fwH < Xn1,…,Xnd>L+en=⁄j=1dwj X1 j+e1…⁄j=1dwj Xnj+en= Xw+ewhere w is a d ä1 column vector of weights andeis ad ä1 columnvectorof iidei~NH0, sL gaussian noise. Notice how wecan compactly compute all the y at once by simply multiplying Xw. If we solve for the noise in equation 9 we obtain:y - Xw=e~NH0, s IL;2 mle_regression.nbPrinted byMathematica for Students(10)Hy -XwL ~NH0, sIL;We see that the residual of our regression model follows a multivariate gaussian with covariance s I were I is the identitymatrix. The density of the multivariate Gaussian takes the form:(11)pHVL =1H2 pLNê2…S »1ê2 ExpB-12 HV -mL S-1HV -mLFwhere V ~NHm, SLand V œ Nä1 is a column vector of size N.LikelihoodUsing equation 10 and 11 we can express the likelihood of our data given our weights w as:PHX, y »wL ∂ LHwL ∂ ExpB-12 Hy - XwL1s I Hy - XwLFWe now want to maximize the likelihood of our data given the weights. First we take the Log to make thinks easierlHwL ∂ Hy - XwL I Hy - XwLNotice that we can remove any additional multiplicative constants. We now havelHwL ∂ Hy - XwLrow vectorIidentityMatrixHy - XwLcol vectorYou should be able to convince yourself that this is equivalent to:lHwL ∂ Hy - XwL Hy - XwLNow lets take the gradient (row vector) derivative with respect to w:∑∑wlHwL ∂∑∑w@Hy - XwL Hy - XwLDTo compute this we will use the gradient of a quadratic matrix equation.For more details see http://en.wikipedia.org/wiki/Matrix_calculus http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html#deriv_quad∑∑wlHwL ∂-Hy - XwL X - Hy - XwL XSimplifying a little∑∑wlHwL ∂-2 Hy - XwL XRemoving extraneous constants∑∑wlHwL ∂-Hy - XwL XApply the tranposemle_regression.nb 3Printed byMathematica for Students∑∑wlHwL ∂-Hy - w XL XMultiplying through by X:∑∑wlHwL ∂-y  X + w X XFinally we set the derivative equal to zero and solve for w to obtain:∑∑wlHwL ∂-y X + w X X = 0w X X = y Xw = y X HX XL-1Finally remvoing the transpose we have:w = HX XL-1 X yThus you have the matrix form of the MLE.4 mle_regression.nbPrinted byMathematica for


View Full Document

CMU CS 10701 - Matrix MLE for Linear Regression

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Matrix MLE for Linear Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Matrix MLE for Linear Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Matrix MLE for Linear Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?