CMU CS 10701 - Lecture - D1693169

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Lecture

DOC PREVIEW

CMU CS 10701 - Lecture

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning 1010 701 15701 15 781 Fall 2006 Overfitting and Model Selection Eric Xing Lecture 7 October 3 2006 Reading Chap 1 2 CB Chap 5 6 TM Outline z Overfitting z Instance based learning z Regression z Bias variance decomposition z The battle against overfitting each learning algorithm has some free knobs that one can tune i e heck to make the algorithm generalizes better to test data But is there a more principled way z Cross validation z Regularization z Model selection Occam s razor z Model averaging z The Bayesian frequentist debate z Bayesian learning weight models by their posterior probabilities 1 Recall Vector Space Representation z Each document is a vector one component for each term word Doc 1 Doc 2 Doc 3 Word 1 3 0 0 Word 2 0 8 1 Word 3 12 1 10 0 1 3 0 0 0 z Normalize to unit length z High dimensional vector space z Terms are axes 10 000 dimensions or even 100 000 z Docs are vectors in this space Classes in a Vector Space Sports Science Arts 2 Test Document Sports Science Arts K Nearest Neighbor kNN classifier Sports Science Arts 3 kNN is an instance of Instance Based Learning z What makes an Instance Based Learner z A distance metric z How many nearby neighbors to look at z A weighting function optional z How to relate to the local points Euclidean Distance Metric D x x 2 i xi xi 2 i z Or equivalently D x x x x T x x z Other metrics z L1 norm x x z L norm max x x elementwise z Mahalanobis where is full and symmetric z Correlation z Angle z Hamming distance Manhattan distance z 4 1 Nearest Neighbor kNN classifier Sports Science Arts 2 Nearest Neighbor kNN classifier Sports Science Arts 5 3 Nearest Neighbor kNN classifier Sports Science Arts 5 Nearest Neighbor kNN classifier Sports Science Arts 6 Nearest Neighbor Learning Algorithm z Learning is just storing the representations of the training examples in D z Testing instance x z z Compute similarity between x and all examples in D Assign x the category of the most similar example in D z Does not explicitly compute a generalization or category prototypes z Also called z z z Case based learning Memory based learning Lazy learning kNN Is Close to Optimal z Cover and Hart 1967 z Asymptotically the error rate of 1 nearest neighbor classification is less than twice the Bayes rate error rate of classifier knowing model that generated data z In particular asymptotic error rate is 0 if Bayes rate is 0 z Decision boundary 7 Overfitting Another example z Regression 8 Overfitting con d z The models z Test errors Bias variance decomposition z Now let s look more closely into two sources of errors in an functional approximator z In the following we show the Bias variance decomposition using LR as an example 9 Loss functions for regression z Let t be the true target output and y x be our estimate The expected squared loss is E L L t y x p x t dxdt t y x 2 p x t dxdt z Out goal is to choose y x that minimize E L z Calculus of variations E L 2 t y x p x t dt 0 y x y x p x t dt tp x t dt y x tp x t dt tp t x dt Et x t E t x p x Expected loss z Let h x E t x be the optimal predictor and y x our actual predictor which will incur the following expected loss E y x t 2 y x h x h x t p x t dxdt 2 y x h x 2 y x h x h x t h x t p x t dxdt 2 2 y x h x p x dx h x t p x t dxdt 2 2 There is an error on pp47 z h x t 2 p x t dxdt is a noisy term and we can do no better than this Thus it is a lower bound of the expected loss y x h x 2 z The other part of the error come from take a close look of it p x dx and let s z We will assume y x y x w is a parametric model and the parameters w are fit to a training set D thus we write y x D 10 Bias variance decomposition z For one data set D and one test point x z since the predictor y depend on the data training data D write ED y x D for the expected predictor over the ensemble of datasets then using the same trick we have y x D h x 2 y x D ED y x D ED y x D h x 2 2 2 y x D ED y x D E D y x D h x 2 y x D ED y x D E D y x D h x z Surely this error term depends on the training data so we take an expectation over them E D y x D h x ED y x D h x E D y x D E D y x D z 2 2 2 Putting things together expected loss bias 2 variance noise Regularized Regression 11 Bias variance tradeoff z is a regularization terms in LR the smaller the is more complex the model why z Simple highly regularized models have low variance but high bias z Complex models have low bias but high variance z You are inspecting an empirical average over 100 training set z The actual ED can not be computed Bias2 variance vs regularizer z Bias2 variance predicts shape of test error quite well z However bias and variance cannot be computed since it relies on knowing the true distribution of x and t and hence h x E t x 12 The battle against overfitting Model Selection z Suppose we are trying select among several different models for a learning problem z Examples 1 polynomial regression h x g 0 1 x 2 x 2 K k x k z Model selection we wish to automatically and objectively decide if k should be say 0 1 or 10 2 locally weighted regression z Model selection we want to automatically choose the bandwidth parameter 3 Mixture models and hidden Markov model z z Model selection we want to decide the number of hidden states The Problem z Given model family F M 1 M 2 K M I find M i F s t M i arg max J D M M F 13 Cross Validation z We are given training data D and test data Dtest and we would like to fit this data with a model pi x from the family F e g an LR which is indexed by i and parameterized by z K fold cross validation CV z z z Set …

View Full Document

CMU CS 10701 - Lecture

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

CMU CS 10701 - Lecture

Sign up for free to view:

Please select your school