## 090517.2

Previewing page
*1*
of
actual document.

**View the full content.**View Full Document

## 090517.2

0 0 49 views

- Pages:
- 4
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview:**

CS446 Machine Learning Fall 2017 Lecture 3 Bias Variance Tradeoff Overfitting Bayes optimal Probabilistic Models Naive Bayes Lecturer Sanmi Koyejo 1 Scribe Xinchen Pan Sep 5th 2017 Bias Variance Trade off Suppose that we have a data matrix x and a set of values y associated to it We try to use a function f to get a prediction y f x where E 0 and V ar 2 Since we have the noise term the prediction values we get will not be exactly same as the true values We denote the prediction to be f x and define a term mean square error which is E y f x 2 We then decompose it E y f x 2 E y 2 f x 2 2y f x E y 2 E f x 2 2E y f x V ar y E y 2 V ar f x E f x 2 2E y f x V ar y V ar f x E y f x 2 2 Variance bias2 Thus 1 1 Variance E f x 2 E f x 2 Bias E y f x Example For a binary classification problem we let all the predictions to be 1 f x 1 1 2 3 Bias Variance Tradeoff Overfitting Bayes optimal Probabilistic Models Naive Bayes Then we will have a variance of 0 However we will have a large bias because the predictions are just like random guess which cannot be accurate 2 Overfitting and Underfitting 2 1 Overfitting Overfitting problem occurs when we have a low training error but high testing error In another word the learning algorithm does not generalize Denote hn as a learning algorithm Dtrain as the training dataset Dtest as the testing dataset R to be a risk function When overfitting we have R hn Dtrain R hn Dtest The training error is much smaller than the testing error 2 1 1 Exmple Consider a k nearest neighbour example for k 1 We will have a training error of 0 because the nearest neighbour in the training set will always be the point itself However when we have a testing set the nearest neighbour for the point in the testing set will not be itself since the point in the testing set will not be used for calculating distance 2 2 Underfitting Underfitting occurs when when have both low training error and low testing error It means that the learning algorithm cannot represent the data well Using the same notation we have R hn Dtrain R hn Dtest It is very rare that R hn Dtrain R hn Dtest 3 Bias Variance Tradeoff Overfitting Bayes optimal Probabilistic Models Naive Bayes 2 2 1 3 Example A simple example is that we use a constant number to predict the data which has a clear curvature Figure 1 Underfitting Example Clearly we will expect the poor performance 3 Bayes Optimal Given the true model p we are trying to use a function f which can minimize the risk between the predicted value and the true value We call this function Bayes Optimal f arg min R f p f F 4 Probabilitic Models Naive Bayes Naive Bayes is a classifier which can be used for both binary classification and multiclass classification In Naive Bayes algorithm we assume that each predictor is independent on each other 4 3 Bias Variance Tradeoff Overfitting Bayes optimal Probabilistic Models Naive Bayes 4 1 Bayes Formula p y x p x y p y p x p y x is the posterior probability which is the one we want to compute p x y is the likelihood which we can get from our training data p y is the prior probability for each y p x is the prior probability for the x Because of the independence assumption we can calculate p y x by p y x p x1 y p x2 y p xn y p y Further more we assume that the value of x associated with each class are in some distributions a typical one is normal distribution Thus 1 p x y c q e 2 y2 x y 2 2 2 y Then we can compare the output probability along all the classes and return the class which has the largest probability 4 2 Pros and Cons Pros It is fast and can usually give a good prediction Cons The independence assumption does not usually hold in real life We need a method to deal with 0 probability situation that is when we have a feature in testing data but it does not exist in training data

View Full Document