## 090517.2

Previewing page *1*
of
actual document.

**View the full content.**View Full Document

## 090517.2

0 0 37 views

- Pages:
- 4
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 3 Bias Variance Tradeoff Overfitting Bayes optimal Probabilistic Models Naive Bayes Lecturer Sanmi Koyejo 1 Scribe Xinchen Pan Sep 5th 2017 Bias Variance Trade off Suppose that we have a data matrix x and a set of values y associated to it We try to use a function f to get a prediction y f x where E 0 and V ar 2 Since we have the noise term the prediction values we get will not be exactly same as the true values We denote the prediction to be f x and define a term mean square error which is E y f x 2 We then decompose it E y f x 2 E y 2 f x 2 2y f x E y 2 E f x 2 2E y f x V ar y E y 2 V ar f x E f x 2 2E y f x V ar y V ar f x E y f x 2 2 Variance bias2 Thus 1 1 Variance E f x 2 E f x 2 Bias E y f x Example For a binary classification problem we let all the predictions to be 1 f x 1 1 2 3 Bias Variance Tradeoff Overfitting Bayes optimal Probabilistic Models Naive Bayes Then we will have a variance of 0 However we will have a large bias because the predictions are just like random guess which cannot be accurate 2 Overfitting and Underfitting 2 1 Overfitting Overfitting problem occurs when we have a low training error but high testing error In another word the learning algorithm does not generalize Denote hn as a learning algorithm Dtrain as the training dataset Dtest as the testing dataset R to be a risk function When overfitting we have R hn Dtrain R hn Dtest The training error is much smaller than the testing error 2 1 1 Exmple Consider a k nearest neighbour example for k 1 We will have a training error of 0 because the nearest neighbour in the training set will always be the point itself However when we have a testing set the nearest neighbour for the point in the testing set will not be itself since the point in the testing set will not be used for calculating distance 2 2 Underfitting Underfitting occurs when when have both low training error and low testing error It means that the learning algorithm cannot represent the data well Using the same notation we have R hn Dtrain R hn Dtest It is very rare that R hn Dtrain R hn Dtest 3 Bias Variance Tradeoff Overfitting Bayes optimal Probabilistic Models Naive Bayes 2 2 1 3 Example A simple example is that we use a constant number to predict the data which has a clear curvature Figure 1 Underfitting Example Clearly we will expect the poor performance 3 Bayes Optimal Given the true model p we are trying to use a function

View Full Document