# ILLINOIS CS 446 - 090717.2 (6 pages)

Previewing pages*1, 2*of 6 page document

**View the full content.**## 090717.2

Previewing pages *1, 2*
of
actual document.

**View the full content.**View Full Document

## 090717.2

0 0 41 views

- Pages:
- 6
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 4 Overfitting Naive Bayes Maximum Likelihood Lecturer Sanmi Koyejo Scribe Liqun Lu Sep 7th 2017 Review of generalization and Bayes optimal Generalization The goal of generalization is to find a a function algorithm that has good prediction accuracy performance on new data such that the risk R hn Dtest where Dtest 6 Dtrain satisfies R hn Dtest R hn P Bayes optimal The Bayes optimal classifier is defined as f argmin R f P f F The accuracy error is determined by two parts see Figure 1 a representation error b statistical error optimization error Figure 1 Representation error and statistical error 1 2 4 Overfitting Naive Bayes Maximum Likelihood Overfitting Underfitting Overfitting Overfitting means a function hn has good training performance but bad test performance i e hn does not generalize R hn Dtrain R hn Dtest Generally overfitting implies that the hypothesis class namely H is too big This means the function form one can choose is too flexible e g has excessive parameters such that it can fit very well on the training data but predict poorly on the test data An example is 1 NN classifiers In many cases it has perfect training performance but can have bad test performances To avoid overfitting one generally make the hypothesis class smaller Underfitting Underfitting is the opposite of overfitting and is usually hard to detect It implies size of H is too small Comparing with how we detect overfitting it is very rare that the test performance is better than training performance i e R hn Dtrain R hn Dtest However one way to detect potential underfitting is that if R hn Dtrain R hn Dtest it may imply under fitting An example is hn 1 where the train performance is nearly the same as test performance Bias Variance The bias and variance have the same meaning for classifiers Suppose x is an estimator of x P The bias is defined as Bias x E x where is the true value that is probably unknown The variance of x is 1X V ar x x i E x 2 E x E x 2 n i For classifiers predictors h the bias is generally written as Bias h R E hn P R f P where f is the Bayes optimal classifiers Alternatively in some text it is also written as Bias h R h P R f P 4 Overfitting Naive Bayes Maximum Likelihood 3 In either definition the bias actually captures the influence of choice of H i e the representation error They tend to be close in general as the data set size becomes large The variance of estimator hn is defined as h i h i V ar hn E R E hn P R hn P 2 E R h P R hn P 2 It gives expectation of how much the classifiers would deviate from the expected value from different draws Question Is R f P

View Full Document