Unformatted text preview:

Statistical Learning Theory CS4780 5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading Mitchell Chapter 7 not 7 4 4 and 7 5 Outline Questions in Statistical Learning Theory How good is the learned rule after n examples How many examples do I need before the learned rule is accurate What can be learned and what cannot Is there a universally best learning algorithm In particular we will address What is the true error of h if we only know the training error of h Finite hypothesis spaces and zero training error Finite hypothesis spaces and non zero training error Infinite hypothesis spaces and VC dimension Can you Convince me of your Psychic Abilities Game I think of n bits H players try to guess the bit sequence If somebody in the class guesses my bit sequence that person clearly has telepathic abilities right Question If at least one player guesses the bit sequence correctly is there any significant evidence that he she has telepathic abilities How large would n and H have to be Learning as Prediction Overview Real world Process P X Y drawn i i d drawn i i d Train Sample Strain x1 y1 xn yn Strain Learner h Test Sample Stest xn 1 yn 1 Goal Find h with small prediction error ErrP h over P X Y Strategy Find any h with small error ErrStrain h on training sample Strain Training Error Error ErrStrain h on training sample Test Error Error ErrStest h on test sample is an estimate of ErrP h Review of Definitions Definition The hypothesis space H is the set of all possible classification rules available to the learner Useful Formulas Binomial Distribution The probability of observing x heads in a sample of n independent coin tosses where in each toss the probability of heads is p is Union Bound Unnamed Generalization Error Bound Finite H Zero Training Error Setting Sample of n labeled instances Strain Learning Algorithm L with a finite hypothesis space H At least one h 2 H has zero training error ErrStrain h Learning Algorithm L returns zero training error hypothesis What is the probability that the prediction error of is larger than Training Sample Strain S train x1 y1 xn yn Learner Test Sample Stest xn 1 yn 1 Sample Complexity Finite H Zero Training Error Setting Sample of n labeled instances Strain Learning Algorithm L with a finite hypothesis space H At least one h 2 H has zero training error ErrStrain h Learning Algorithm L returns zero training error hypothesis How many training examples does L need so that with probability at least 1 it learns an with prediction error less than Training Sample Strain S train x1 y1 xn yn Learner Test Sample Stest xn 1 yn 1 Probably Approximately Correct Learning Example Smart Investing Task Pick stock analyst based on past performance Experiment Review analyst prediction next day up down for past 10 days Pick analyst that makes the fewest errors Situation 1 1 stock analyst A1 A1 makes 5 errors Situation 2 3 stock analysts A1 B1 B2 B2 best with 1 error Situation 3 1003 stock analysts A1 B1 B2 C1 C1000 C543 best with 0 errors Question Which analysts are you most confident in A1 B2 or C543 Useful Formula Hoeffding Chernoff Bound For any distribution P X where X can take the values 0 and 1 the probability that an average of an i i d sample deviates from its mean p by more than is bounded as Generalization Error Bound Finite H Non Zero Train Err Setting Sample of n labeled instances S Learning Algorithm L with a finite hypothesis space H L returns hypothesis L S with lowest training error What is the probability that the prediction error of exceeds the fraction of training errors by more than Training Sample Strain S train x1 y1 xn yn Learner Test Sample Stest xn 1 yn 1 Overfitting vs Underfitting Mitchell 1997 With probability at least 1 Generalization Error Bound Infinite H Non Zero Train Err Setting Sample of n labeled instances S Learning Algorithm L using a hyp space H with VCDim H d L returns hypothesis L S with lowest training error Given hypothesis space H with VCDim H equal to d and an i i d sample S of size n with probability 1 it holds that


View Full Document

CORNELL CS 4780 - Statistical Learning Theory

Loading Unlocking...
Login

Join to view Statistical Learning Theory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical Learning Theory and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?