DOC PREVIEW
CMU CS 10701 - What’s learning, revisited Overfitting

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1©Carlos Guestrin 2005-2007What’s learning, revisitedOverfittingGenerative versus DiscriminativeLogistic RegressionMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversitySeptember 19th, 2007©Carlos Guestrin 2005-2007Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class → less bias More complex class → more variance2©Carlos Guestrin 2005-2007Training set error Given a dataset (Training data) Choose a loss function e.g., squared error (L2) for regression Training set error: For a particular set ofparameters, loss function on training data:©Carlos Guestrin 2005-2007Training set error as a function ofmodel complexity3©Carlos Guestrin 2005-2007Prediction error Training set error can be poor measure of“quality” of solution Prediction error: We really care about errorover all possible input points, not just trainingdata:©Carlos Guestrin 2005-2007Prediction error as a function ofmodel complexity4©Carlos Guestrin 2005-2007Computing prediction error Computing prediction hard integral May not know t(x) for every x Monte Carlo integration (sampling approximation) Sample a set of i.i.d. points {x1,…,xM} from p(x) Approximate integral with sample average©Carlos Guestrin 2005-2007Why training set error doesn’tapproximate prediction error? Sampling approximation of prediction error: Training error : Very similar equations!!! Why is training set a bad measure of prediction error???5©Carlos Guestrin 2005-2007Why training set error doesn’tapproximate prediction error? Sampling approximation of prediction error: Training error : Very similar equations!!! Why is training set a bad measure of prediction error???Because you cheated!!! Training error good estimate for a single w, But you optimized w with respect to the training error, and found w that is good for this set of samplesTraining error is a (optimistically) biased estimate of prediction error ©Carlos Guestrin 2005-2007Test set error Given a dataset, randomly split it into two parts: Training data – {x1,…, xNtrain} Test data – {x1,…, xNtest} Use training data to optimize parameters w Test set error: For the final solution w*,evaluate the error using:6©Carlos Guestrin 2005-2007Test set error as a function ofmodel complexity©Carlos Guestrin 2005-2007Overfitting Overfitting: a learning algorithm overfits thetraining data if it outputs a solution w when thereexists another solution w’ such that:7©Carlos Guestrin 2005-2007How many points to I use fortraining/testing? Very hard question to answer! Too few training points, learned w is bad Too few test points, you never know if you reached a good solution Bounds, such as Hoeffding’s inequality can help: More on this later this semester, but still hard to answer Typically: if you have a reasonable amount of data, pick test set “large enough”for a “reasonable” estimate of error, and use the rest for learning if you have little data, then you need to pull out the big guns… e.g., bootstrapping©Carlos Guestrin 2005-2007Error estimators8©Carlos Guestrin 2005-2007Error as a function of number of trainingexamples for a fixed model complexitylittle datainfinite data©Carlos Guestrin 2005-2007Error estimatorsBe careful!!! Test set only unbiased if you never never never neverdo any any any any learning on the test dataFor example, if you use the test set to selectthe degree of the polynomial… no longer unbiased!!!(We will address this problem later in the semester)9©Carlos Guestrin 2005-2007Announcements First homework is out: Programming part and Analytic part Remember collaboration policy: can discussquestions, but need to write your own solutions andcode Remember you are not allowed to look at previousyears’ solutions, search the web for solutions, usesomeone else’s solutions, etc. Due Oct. 3rd beginning of class Start early! Recitation this week: Bayes optimal classifiers, Naïve Bayes©Carlos Guestrin 2005-2007What’s (supervised) learning, moreformally Given: Dataset: Instances {〈x1;t(x1)〉,…, 〈xN;t(xN)〉} e.g., 〈xi;t(xi)〉 = 〈(GPA=3.9,IQ=120,MLscore=99);150K〉 Hypothesis space: H e.g., polynomials of degree 8 Loss function: measures quality of hypothesis h2H e.g., squared error for regression Obtain: Learning algorithm: obtain h2H that minimizes loss function e.g., using matrix operations for regression Want to minimize prediction error, but can only minimize error in dataset10©Carlos Guestrin 2005-2007Types of (supervised) learningproblems, revisited Regression, e.g., dataset: 〈position; temperature〉 hypothesis space: Loss function: Density estimation, e.g., dataset: 〈grades〉 hypothesis space: Loss function: Classification, e.g., dataset: 〈brain image; {verb v. noun}〉 hypothesis space: Loss function:©Carlos Guestrin 2005-2007Learning is (simply) functionapproximation! The general (supervised) learning problem: Given some data (including features), hypothesis space, lossfunction Learning is no magic! Simply trying to find a function that fits the data Regression Density estimation Classification (Not surprisingly) Seemly different problem, very similarsolutions…11©Carlos Guestrin 2005-2007What is NB really optimizing? Naïve Bayes assumption: Features are independent given class: More generally: NB Classifier:©Carlos Guestrin 2005-2007MLE for the parameters of NB Given dataset Count(A=a,B=b) Ã number of examples where A=a and B=b MLE for NB, simply: Prior: P(Y=y) = Likelihood: P(Xi=xi|Yi=yi) =12©Carlos Guestrin 2005-2007What is NB really optimizing?Let’s use an example©Carlos Guestrin 2005-2007Generative v. Discriminativeclassifiers – Intuition Want to Learn: h:X a Y X – features Y – target classes Bayes optimal classifier – P(Y|X) Generative classifier, e.g., Naïve Bayes: Assume some functional form for P(X|Y), P(Y) Estimate parameters of P(X|Y), P(Y) directly from training data Use Bayes rule to calculate P(Y|X= x) This is a ‘generative’ model Indirect computation of P(Y|X) through Bayes rule But, can generate a sample of the data, P(X) = ∑y P(y) P(X|y) Discriminative classifiers, e.g., Logistic Regression: Assume some functional form for


View Full Document

CMU CS 10701 - What’s learning, revisited Overfitting

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download What’s learning, revisited Overfitting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view What’s learning, revisited Overfitting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view What’s learning, revisited Overfitting 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?