# ILLINOIS CS 446 - 090717.3 (6 pages)

Previewing pages*1, 2*of 6 page document

**View the full content.**## 090717.3

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 090717.3

0 0 38 views

- Pages:
- 6
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 1 Overfitting Naive Bayes Logistc Regression MLE Lecturer Sanmi Koyejo Scribe Zhenbang Wang Sept 7th 2017 Review Generalization Generalization refers to how accurately a model can predict the result from unseen data For a given data generating distribution P a good model should satisfy R hn Dtest R hn P Overfitting Overfitting means that a model hn has a good performance on training data but has a abd performance on unseen data In this condition hn does not generalize In terms of risk presentation R hn Dtrain R hn Dtest Underfitting Underfitting is the opposite of overfitting It means that a model does not fit our data well enough Typically small hypothesis function space H will lead to underfitting and underfitting is hard to detect However a similar performance between training data and test data can be a clue for underfitting In other words R hn Dtrain R hn Dtest For rare underfitting cases models perform better on test data than training data R hn Dtest R hn Dtrain Generally underfitting can be fixed by enlarging the size of H 1 2 1 Overfitting Naive Bayes Logistc Regression MLE Bayes Optimal The Bayes optimal classifier is the classifier that minimizes the risk f arg maxR f P f F where F is the spcae inlcuding all possible classifiers Bias and Variance Bias and variance are two measurements to descirpe errors in learning algorithms Bias Bias comes from representation error and bias of an estimator is the di erence between the expected value and the true value Assume that x is supposed to estimate the data distribution P then Bias x E x where is true value For classifiers bias is defined as following Bias hn R E hn P R f P or Bias hn R h P R f P where f respresents the optimal classifier Variance Variance captures small fluctuations in the training set Assume that x is supposed to estimate the data distribution P then V ar x E x E x 2 For classifiers variance is defined as following V ar hn E R E hn P R hn P 2 or V ar hn E R h P R hn P 2 1 Overfitting Naive Bayes Logistc Regression MLE 3 Bias Variance Tradeo Biasvariance tradeo is a common problem we want to simultaneously minimize bias and variance so we seek for a good tradeo point with a given risk function R For example see Figure 1 Figure 1 Bias Variance Tradeo Special Case When the risk measurement R is a square loss function total error can be nicely presentated Formally R hn P E y hn x 2 Error hn noise Bias 2 V ar where noise is the irreduciable error or the error of bayes optimal classifier Picking up good classifiers Try random algorithm Empirical risk minimization ERM hn R h Dn f H Probabilistic approach find a nice approximated data distribution P such that P P and

View Full Document