ILLINOIS CS 446 - 091217.2 (8 pages)

Previewing pages 1, 2, 3 of 8 page document View the full content.
View Full Document

091217.2



Previewing pages 1, 2, 3 of actual document.

View the full content.
View Full Document
View Full Document

091217.2

47 views


Pages:
8
School:
University of Illinois - urbana
Course:
Cs 446 - Machine Learning

Unformatted text preview:

CS446 Machine Learning Fall 2017 Lecture 5 Optimization Convexity Gradient Descent Lecturer Sanmi Koyejo Scribe Ally Kaminsky September 12 2017 Recap of previous lecture Overfitting and Underfitting Overfitting Overfitting of a model hn occurs when it is too complex This means that the hypothesis class H is too big or flexible i e has too many parameters Overfitting causes a model perform well on training data but poorly on testing data Underfitting Underfitting of a model means that its hypothesis class H is too small This means that the function form has too few parameters This case is more difficult to identify than overfitting because performance of a model on its training data is almost always better than its performance on testing data Bias Variance Tradeoff The bias of an estimator is Bias hn E hn where hn is the estimator on some observed data and is an unknown fixed constant The variance of an estimator hn is defined as V ariance hn E E hn hn 2 where hn is an estimator on some observed data Bias and variance help us understand error tradeoffs in a model which applies to all supervised learning tasks The expected risk of a model can be expressed as E R hn Dn Bias V ariance 1 2 5 Optimization Convexity Gradient Descent where hn is is an estimator on some observed data Dn Selecting and Training Models There are several ways to select and train models Come up with an algorithm that works well i e engineer an algorithm This is not a reasonable systematic approach but works well in practice sometimes Empirical Risk Minimization ERM Find a model hn such that hn argmin R h Dn where H is the hypothesis class h H of the model and R h Dn is the risk of a model h given dataset Dn The resulting model from ERM is very tied to the risk function Probabilistic model approach Find a good approximation P P Find a model hn such that hn argmin R h P where F is the class of all h F possible prediction functions and R h P is the risk of a model h given P Benefits of the probabilistic model approach You do not need to start from scratch for each risk function because P has already been found For this reason probabilistic models can be reused Probabilistic models can incorporate domain knowledge Fitting a model Motivating Example The goal of the following is to find a model that explains some data well Suppose X 0 1 and that dataset Dn x1 x2 xn Assume Dn is sampled independent and identically distributed iid from the Bernoulli distribution B for some parameter This means that we will sample X P x as 1 with probability and 0 with probability 1 That is P X x 1 1 x The conditional probabilities that x 0 and x 1 given are P x 0 1 P x 1 5 Optimization Convexity Gradient



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view 091217.2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 091217.2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?