DOC PREVIEW
CMU CS 10701 - What’s learning, revisited Overfitting

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

What s learning revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University September 19th 2007 Carlos Guestrin 2005 2007 Bias Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More complex class more variance Carlos Guestrin 2005 2007 1 Training set error Given a dataset Training data Choose a loss function e g squared error L2 for regression Training set error For a particular set of parameters loss function on training data Carlos Guestrin 2005 2007 Training set error as a function of model complexity Carlos Guestrin 2005 2007 2 Prediction error Training set error can be poor measure of quality of solution Prediction error We really care about error over all possible input points not just training data Carlos Guestrin 2005 2007 Prediction error as a function of model complexity Carlos Guestrin 2005 2007 3 Computing prediction error Computing prediction hard integral May not know t x for every x Monte Carlo integration sampling approximation Sample a set of i i d points x1 xM from p x Approximate integral with sample average Carlos Guestrin 2005 2007 Why training set error doesn t approximate prediction error Sampling approximation of prediction error Training error Very similar equations Why is training set a bad measure of prediction error Carlos Guestrin 2005 2007 4 Why training set error doesn t approximate prediction error Sampling approximation prediction error Because of you cheated Training error good estimate for a single w But you optimized w with respect to the training error and found w that is good for this set of samples Training error Training error is a optimistically biased estimate of prediction error Very similar equations Why is training set a bad measure of prediction error Carlos Guestrin 2005 2007 Test set error Given a dataset randomly split it into two parts Training data x1 xNtrain Test data x 1 xNtest Use training data to optimize parameters w Test set error For the final solution w evaluate the error using Carlos Guestrin 2005 2007 5 Test set error as a function of model complexity Carlos Guestrin 2005 2007 Overfitting Overfitting a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w such that Carlos Guestrin 2005 2007 6 How many points to I use for training testing Very hard question to answer Too few training points learned w is bad Too few test points you never know if you reached a good solution Bounds such as Hoeffding s inequality can help More on this later this semester but still hard to answer Typically if you have a reasonable amount of data pick test set large enough for a reasonable estimate of error and use the rest for learning if you have little data then you need to pull out the big guns e g bootstrapping Carlos Guestrin 2005 2007 Error estimators Carlos Guestrin 2005 2007 7 Error as a function of number of training examples for a fixed model complexity little data infinite data Carlos Guestrin 2005 2007 Error estimators Be careful Test set only unbiased if you never never never never do any any any any learning on the test data For example if you use the test set to select the degree of the polynomial no longer unbiased We will address this problem later in the semester Carlos Guestrin 2005 2007 8 Announcements First homework is out Programming part and Analytic part Remember collaboration policy can discuss questions but need to write your own solutions and code Remember you are not allowed to look at previous years solutions search the web for solutions use someone else s solutions etc Due Oct 3rd beginning of class Start early Recitation this week Bayes optimal classifiers Na ve Bayes Carlos Guestrin 2005 2007 What s supervised learning more formally Given Dataset Instances x1 t x1 xN t xN Hypothesis space H e g polynomials of degree 8 Loss function measures quality of hypothesis h2H e g xi t xi GPA 3 9 IQ 120 MLscore 99 150K e g squared error for regression Obtain Learning algorithm obtain h2H that minimizes loss function e g using matrix operations for regression Want to minimize prediction error but can only minimize error in dataset Carlos Guestrin 2005 2007 9 Types of supervised learning problems revisited Regression e g Density estimation e g dataset position temperature hypothesis space Loss function dataset grades hypothesis space Loss function Classification e g dataset brain image verb v noun hypothesis space Loss function Carlos Guestrin 2005 2007 Learning is simply function approximation The general supervised learning problem Given some data including features hypothesis space loss function Learning is no magic Simply trying to find a function that fits the data Regression Density estimation Classification Not surprisingly Seemly different problem very similar solutions Carlos Guestrin 2005 2007 10 What is NB really optimizing Na ve Bayes assumption Features More are independent given class generally NB Classifier Carlos Guestrin 2005 2007 MLE for the parameters of NB Given dataset Count A a B b number of examples where A a and B b MLE for NB simply Prior P Y y Likelihood P Xi xi Yi yi Carlos Guestrin 2005 2007 11 What is NB really optimizing Let s use an example Carlos Guestrin 2005 2007 Generative v Discriminative classifiers Intuition Want to Learn h X a Y X features Y target classes Bayes optimal classifier P Y X Generative classifier e g Na ve Bayes Assume some functional form for P X Y P Y Estimate parameters of P X Y P Y directly from training data Use Bayes rule to calculate P Y X x This is a generative model Indirect computation of P Y X through Bayes rule But can generate a sample of the data P X y P y P X y Discriminative classifiers e g Logistic Regression Assume some functional form for P Y X Estimate parameters of P Y X directly from training data This is the discriminative model Directly learn P Y X But cannot obtain a sample of the data because P X is not available Carlos Guestrin 2005 2007 12 Logistic function or Sigmoid Logistic Regression Learn P Y X directly Assume a particular functional form Sigmoid applied to a linear function of the data Z Features can be discrete or continuous Carlos Guestrin 2005 2007 Understanding the sigmoid w0 2 w1 1 w0 0 w1 1 w0 0 w1 0 5 1 1 1 0 9 0 9 0 9 0 8 0 8 0 8 0 7 0 7 0 7 0 6 0 6 0 6 0 5 0 5 0 5 0 4 0 4 0 4 0 3 0 3 0 3 0 2 0 2 0 2 0 1 0 1 0 6 4 2 0 2 4 6 0 6 0 1 4 2 0 2 4


View Full Document

CMU CS 10701 - What’s learning, revisited Overfitting

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download What’s learning, revisited Overfitting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view What’s learning, revisited Overfitting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view What’s learning, revisited Overfitting and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?