# ILLINOIS CS 446 - 091217.2 (8 pages)

Previewing pages*1, 2, 3*of 8 page document

**View the full content.**## 091217.2

Previewing pages
*1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## 091217.2

0 0 47 views

- Pages:
- 8
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 5 Optimization Convexity Gradient Descent Lecturer Sanmi Koyejo Scribe Ally Kaminsky September 12 2017 Recap of previous lecture Overfitting and Underfitting Overfitting Overfitting of a model hn occurs when it is too complex This means that the hypothesis class H is too big or flexible i e has too many parameters Overfitting causes a model perform well on training data but poorly on testing data Underfitting Underfitting of a model means that its hypothesis class H is too small This means that the function form has too few parameters This case is more difficult to identify than overfitting because performance of a model on its training data is almost always better than its performance on testing data Bias Variance Tradeoff The bias of an estimator is Bias hn E hn where hn is the estimator on some observed data and is an unknown fixed constant The variance of an estimator hn is defined as V ariance hn E E hn hn 2 where hn is an estimator on some observed data Bias and variance help us understand error tradeoffs in a model which applies to all supervised learning tasks The expected risk of a model can be expressed as E R hn Dn Bias V ariance 1 2 5 Optimization Convexity Gradient Descent where hn is is an estimator on some observed data Dn Selecting and Training Models There are several ways to select and train models Come up with an algorithm that works well i e engineer an algorithm This is not a reasonable systematic approach but works well in practice sometimes Empirical Risk Minimization ERM Find a model hn such that hn argmin R h Dn where H is the hypothesis class h H of the model and R h Dn is the risk of a model h given dataset Dn The resulting model from ERM is very tied to the risk function Probabilistic model approach Find a good approximation P P Find a model hn such that hn argmin R h P where F is the class of all h F possible prediction functions and R h P is the risk of a model h given P Benefits of the probabilistic model approach You do not need to start from scratch for each risk function because P has already been found For this reason probabilistic models can be reused Probabilistic models can incorporate domain knowledge Fitting a model Motivating Example The goal of the following is to find a model that explains some data well Suppose X 0 1 and that dataset Dn x1 x2 xn Assume Dn is sampled independent and identically distributed iid from the Bernoulli distribution B for some parameter This means that we will sample X P x as 1 with probability and 0 with probability 1 That is P X x 1 1 x The conditional probabilities that x 0 and x 1 given are P x 0 1 P x 1 5 Optimization Convexity Gradient

View Full Document