# ILLINOIS CS 446 - 091917.1 (9 pages)

Previewing pages*1, 2, 3*of 9 page document

**View the full content.**## 091917.1

Previewing pages *1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## 091917.1

0 0 41 views

- Pages:
- 9
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 1 Forward Backward selection Variable Selection Lecturer Sanmi Koyejo Scribe Dipali Ranjan Sept 19th 2017 Recap 1 1 MAP The key idea behind Maximum A Posteriori MAP is to find a parameter estimate which is the maximum of posterior for given data Dn xi yi ni 1 i e we choose value that is most probable given observed data and prior beliefs We assume that is a random variable M AP argmax P Dn As we are maximizing for Naive Bayes we don t want to focus in estimate but the peak of that estimate P Dn P argmax P Dn 1 As we are maximizing the above equation and doesn t appear in the denominator we can rewrite the above equation as argmax P Dn P 2 Taking Log Likelihood of eqn 2 argmax log P Dn log P 3 1 2 Example Linear Regression Ridge Regression One problem with Maximum Likelihood estimation is that it can result in overfitting We use Ridge Regression to ameliorate this problem by using MAP estimation with a Gaussian prior Murphy 2012 Ridge regression use L2 regularization to minimize sum of squares of wi entries 1 2 1 Forward Backward selection Variable Selection Let s take an example of Linear Regression model such that y N wT xi and wi N 0 2 wM AP argmin y Xw 22 w 22 4 w 2 2 where form solution In the previous lecture we saw how to solve eqn 4 and get a closed Why do we need Regularization The regularization term w 22 penalizes all wi s i e it reduces wi s This helps in reducing the effect of single large features Bias Variance Tradeoff in Regression Mean Square Error variance bias2 This is called the bias variance tradeoff It shows that it might be wise to use a biased estimator so long as it reduces our variance assuming our goal is to minimize squared error Regularization helps in reducing variance increasing bias and in practice improving performance 2 L2 Regularization Equation 4 can be re written as wM AP argmin y Xw 22 w 22 w There exists a such that min y Xw 22 w 5 Such that w 22 In Figure1 the Minimizing cost refers to problems without regularization eqn 5 The solution to the problem in 5 is given by the intersection of contour and the circle radius showed as Minimize cost penalty in the figure 1 3 Variable Selection Introduction Variable selection means selecting which variables to include in our model rather than some sort of selection which is itself variable As such it is a special case of model selection People tend to use the phrase variable selection when the competing models 1 We can also write it as an Inductive Bias with linear predictions and small weights 1 Forward Backward selection Variable Selection 3 Figure 1 L2 regularization Raschka 2017 differ on which variables should be included but agree

View Full Document