Unformatted text preview:

Outline Statistical Learning Theory CS4780 Machine Learning Fall 2009 Thorsten Joachims Cornell University Questions in Statistical Learning Theory How good is the learned rule after n examples How many examples do I need before the learned rule is accurate What can be learned and what cannot Is there a universally best learning algorithm In particular we will address What is the true error of h if we only know the training error of h Finite hypothesis spaces and zero training error Finite hypothesis spaces and non zero training error Infinite hypothesis spaces and VC dimension Reading Mitchell Chapter 7 not 7 4 4 and 7 5 Can you Convince me of your Psychic Abilities Game I think of n bits H players try to guess the bit sequence 0101 Question If at least one player guesses the bit sequence correctly is there any significant evidence that he she has telepathic abilities How large would n and H have to be Learning as Prediction Task drawn i i d Real world Process P X Y Training Sample Strain S train x1 y1 xn yn Learner drawn i i d h Test Sample Stest xn 1 yn 1 Goal Find h with small prediction error ErrP h over P X Y Strategy Find any h with small error ErrStrain h on training sample Strain Training Error Error ErrStrain h on training sample Test Error Error ErrStest h on test sample is an estimate of ErrP h Review of Definitions Useful Formulas Binomial Distribution The probability of observing x heads in a sample of n independent coin tosses where in each toss the probability of heads is p is Union Bound Unnamed Definition The hypothesis space H is the set of all possible classification rules available to the learner 1 Sample Complexity Finite H Zero Training Error Generalization Error Bound Finite H Zero Training Error Setting Sample of n labeled instances Strain Learning Algorithm L with a finite hypothesis space H At least one h 2 H has zero training error ErrStrain h What is the probability that the prediction error of is larger than Learning Algorithm L returns zero training error hypothesis Training Sample Strain S train x1 y1 xn yn Learner h Test Sample Stest xn 1 yn 1 Setting Sample of n labeled instances Strain Learning Algorithm L with a finite hypothesis space H At least one h 2 H has zero training error ErrStrain h Learning Algorithm L returns zero training error hypothesis How many training examples does L need so that with probability 1 it learns an with prediction error less than Training Sample Strain S train x1 y1 xn yn Learner h Test Sample Stest xn 1 yn 1 Example Smart Investing Probably Approximately Correct Learning Task Pick stock analyst based on past performance Experiment Review analyst prediction next day up down for past 10 days Pick analyst that makes the fewest errors Situation 1 1 stock analyst A1 A1 makes 5 errors Situation 2 3 stock analysts A1 B1 B2 B2 best with 1 error Situation 3 1003 stock analysts A1 B1 B2 C1 C1000 C543 best with 0 errors Which analysts are you most confident in A1 B2 or C543 Generalization Error Bound Finite H Non Zero Training Error Useful Formula Hoeffding Chernoff Bound For any distribution P X where X can take the values 0 and 1 the probability that an average of an i i d sample deviates from its mean p by more than is bounded as Setting Sample of n labeled instances S Learning Algorithm L with a finite hypothesis space H What is the probability that the prediction error of exceeds the fraction of training errors by more than L returns hypothesis L S with lowest training error Training Sample Strain S train x1 y1 xn yn h Learner Test Sample Stest xn 1 yn 1 2 Example Smart Investing Overfitting vs Underfitting Task Pick stock analyst based on past performance Experiment Have analyst predict next day up down for 10 days Pick analyst that makes the fewest errors Situation 1 1 stock analyst A1 A1 makes 5 errors Situation 2 3 stock analysts A1 B1 B2 B2 best with 1 error Mitchell 1997 Situation 3 1003 stock analysts A1 B1 B2 C1 C1000 C543 best with 0 errors Which analysts are you most confident in A1 B2 or C543 With probability at least 1 Generalization Error Bound Infinite H Non Zero Training Error Setting Sample of n labeled instances S Learning Algorithm L with a hypothesis space H with VCDim H d L returns hypothesis L S with lowest training error Given hypothesis space H with VCDim H equal to d and an i i d sample S of size n with probability 1 it holds that 3


View Full Document

CORNELL CS 4780 - Statistical Learning Theory

Loading Unlocking...
Login

Join to view Statistical Learning Theory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical Learning Theory and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?