# ILLINOIS CS 446 - 091917.2 (8 pages)

Previewing pages*1, 2, 3*of 8 page document

**View the full content.**## 091917.2

Previewing pages *1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## 091917.2

0 0 27 views

- Pages:
- 8
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 8 Variable Selection Lecturer Sanmi Koyejo Scribe Sarah Christensen Sept 19th 2017 Ridge Regression Recap of MAP Estimation Maximum a posteriori MAP estimation takes in a set of observations Dn Xi yi ni 1 and seeks a parameter M AP that maximizes the posterior distribution An important assumption to recognize is that the model parameters here are treated random variables drawn from a distribution M AP argmax P Dn Next we can rewrite the above equation using Bayes Theorem M AP argmax P Dn P P Dn Since the denominator does not depend on we can ignore this term and take the log to get the log likelihood function M AP argmax log P Dn log P Notice that this is similar to the maximum likelihood estimate for but has an additional term that incorporates a prior distribution over Ridge Regression Now we introduce a regularized least squares regression method called a Ridge regression where MAP estimation is used with a Gaussian prior to estimate the weight vector More specifically it is a linear regression where yi N wT xi 2 and w N 0 2 We have shown previously that wM AP argmin y Xw 22 w 22 where w 1 2 2 1 2 8 Variable Selection We have also previously shown that a closed form solution to this minimization problem exists To aid with visualization Equation 1 can be rewritten with Lagrange multipliers wM AP argmin y Xw 22 subject to w 22 for some 0 w Figure 1 A graphical interpretation of the Ridge regression in two dimensions The MAP estimator wM AP can be found at the intersection of the contour plot and the l2 ball This figure was adapted from Singh and Poczos 2014 Notice that Equation 1 looks similar to that of ordinary least squares OLS but there is an extra term that shifts the correlation matrix OLS can suffer from a problem of overfitting and small changes in the observed data can sometimes lead to big changes in the estimated parameters Ridge regression is an instance of shrinkage or regularization which tries to address this

View Full Document