# ILLINOIS CS 446 - scribe_1 (10 pages)

Previewing pages*1, 2, 3*of 10 page document

**View the full content.**## scribe_1

Previewing pages *1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## scribe_1

0 0 42 views

- Pages:
- 10
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 5 Probabilistic Model ML Grad Descent Regression Lecturer Sanmi Koyejo Scribe Shruti Bhargava Sep 12th 2017 Recap Overfitting We have seen that overfitting refers to the condition when our model or the function class H we are working with is too flexible or too big This means that we have incorporated extremely minor variations of the input too catering to noise signals along with the true signal The detection technique is train performance becomes high and exceeds the test performance Underfitting This implies that our H is too small or over constrained In other words it lacks the capacity to incorporate sufficient variations possessed by the true signal Bias Variance tradeoff This helps us to understand the error tradeoffs in a model Bias tells us something about the flexibility or representation ability It conveys how far one is from the best possible ie Bias E hn f where hn is our hypothesis function f is the Bayes optimal function Variance on the other hand talks about the noise or optimization error We have seen that V ariance E E hn hn 2 Most risk functions R can be expressed as some function of the bias and the variance of the model ie for a given R we can find a function such that E R hn Dn Bias V ariance where Dn is the dataset used for evaluation One example of such a function is if we look at the risk in terms of the squared loss or squared error that is defined as y hn 2 It can be shown that proof in section 6 4 4 of 1 2 5 Probabilistic Model ML Grad Descent Regression textbook E y hn 2 noise Bias2 V ariance This tells us the knobs that one can pull in order to minimize the error Several other examples are given in the textbook The fact that bias and variance are tied together can be understood by observing that increasing one would result in decrease in the other thereby affecting the total error hence the trade off matters Selecting Training Models We initially started by trying to select and train machine learning models The different tracks to do so were discussed previously as follows 0 Come up with an algorithm This seems vague but in practice a lot of effective machine learning is done through engineering and playing around with algorithms What matters is that your algorithm should give good predictions on new data 1 Empirical Risk Minimization ERM A more principled approach to finding models wherein from a class of functions one picks the model that gives least error on training data ie hn argmin R h Dn h H where Dn is the training data H is the search space subset of F 2 Probabilistic Modelling This framework also provides a structured approach to obtaining a good model It broadly involves the following 2 steps a Find p to approximately mimic the data generating distribution p

View Full Document