# ILLINOIS CS 446 - 090517.3 (11 pages)

Previewing pages*1, 2, 3, 4*of 11 page document

**View the full content.**## 090517.3

Previewing pages *1, 2, 3, 4*
of
actual document.

**View the full content.**View Full Document

## 090517.3

0 0 38 views

- Pages:
- 11
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 3 Generalization and Bayes Optimal Lecturer Sanmi Koyejo Scribe Vinitha Ravichandran Sep 5th 2017 Generalization The ability of a learned model to make accurate predictions on novel inputs is known as generalization Let X denote the instance space and Y denote the target space The data generating model iid is a distribution P P such that xi yi P where xi X and yi Y The training data is a sample Dn xi yi ni 0 Let hn denote a function that maps the instance space to the target space hn depends on the training data Generalization error Gn is the expected value of the misclassification rate when averaged over future data It is defined as the difference between the risk of the classifier at the distribution level and the risk of the classifier of a particular dataset We say that a classifier generalizes when Gn 0 Gn R hn P R hn Dn Evaluation of generalization error Approximation Since we dont get to see P we evaluate generalization error using approximate generalization error G n It is defined as the difference between the risk of the classifier on the test set and the risk of the classifier on the training set G n R hn DT est R hn DT rain Cross Validation An alternate strategy to evaluate the generalization of a classifier is cross validation In cross validation we split the dataset into k folds i e Dn D1 D2 Dk We train on all 1 2 3 Generalization and Bayes Optimal the folds but the k th and test on the k th in a round robin fashion We then compute the generalization error averaged over all the folds For a dataset draw D generalization error can be defined by G n ED R hn DT est R hn DT rain Why do we average We average to reduce the variance Since the dataset Dn that you see is random This implies that the classifier hn we train is also random as it is dependant on the dataset Therefore the generalization error estimate Gn is a random variable We obtain a different generalization error for every time the classifier is run over a

View Full Document