DOC PREVIEW
CORNELL CS 472 - Foundations of Artificial Intelligence

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Foundations of Artificial IntelligenceStatistical Learning TheoryCS472 – Fall 2007Thorsten JoachimsOutlineQuestions in Statistical Learning Theory:– How good is the learned rule after n examples?– How many examples do I need before the learned rule is accurate?– What can be learned and what cannot?– Is there a universally best learning algorithm?In particular, we will address: What is the true error of h if we only know the training error of h?– Finite hypothesis spaces and zero training error– (Finite hypothesis spaces and non-zero training error)Game: Randomized 20-QuestionsGame: 20-Questions– I think of object f–For i = 1 to 20• You get to ask 20 yes/no questions about f and I have to answer truthfully– You make a guess h– You win, if f=hGame: Randomized 20-Questions– I pick function f ∈H, where f: X Æ{-1,+1}–For i = 1 to 20• World delivers instances x ∈X with probability P(x) and I have to tell you f(x)– You form hypothesis h ∈H trying to guess my f ∈H– You win if f(x)=h(x) with probability at least 1-ε for x drawn according to P(x).Inductive Learning Model• Probably Approximately Correct (PAC) Learning Model:– Take any function f from H–Draw n Training examples Dtrainfrom P(x), label as y=f(x)– Run learning algorithm on Dtrainto produce h from H– Gather Test Examples Dtestfrom P(x)– Apply h to Dtestand measure fraction (probability) of h(x)≠f(x)– How likely is it that error probability is less than some threshold ε(for any f from H)?Real-world Process(x1,y1), …, (xn,yn)Learner(xn+1,yn+1), …Training Data DtrainTest Data DtestP(x), y=f(x) P(x), y=f(x)hDtrainMeasuring Prediction PerformanceGeneralization Error Bound: Finite H, Zero Training Error• Model and Learning Algorithm– Learning Algorithm A with a finite hypothesis space H– Sample of n labeled examples Dtraindrawn according to P(x)– Target function f ∈HÎ At least one h ∈H has zero training error ErrDtrain(h) – Learning Algorithm A returns zero training error hypothesis ĥ• What is the probability δthat the prediction error of ĥ is larger thanε?(x1,y1), …, (xn,yn)Learner(xn+1,yn+1), …Training Sample DtrainTest Sample DtestDtrainhˆ2Useful Formulas• Binomial Distribution: The probability of observing xheads in a sample of n independent coin tosses, where in each toss the probability of heads is p, is• Union Bound:• Unnamed:Sample Complexity: Finite H, Zero Training Error• Model and Learning Algorithm– Sample of n labeled examples Dtrain– Learning Algorithm A with a finite hypothesis space H– At least one h ∈H has zero training error ErrDtrain(h) – Learning Algorithm L returns zero training error hypothesis ĥ• How many training examples does A need so that with probability at least (1-δ)it learns an ĥ with prediction error less thanε?(x1,y1), …, (xn,yn)Learner(xn+1,yn+1), …Training Sample DtrainTest Sample DtestDtrainhˆExample: Smart InvestingTask: Pick stock analyst based on past performance.Experiment: – Have analyst predict “next day up/down” for 10 days.– Pick analyst that makes the fewest errors.Situation 1: – 1 stock analyst {A1}, A1 makes 5 errorsSituation 2: – 3 stock analysts {A1,B1,B2}, B2 best with 1 errorSituation 3: – 1003 stock analysts {A1,B1,B2,C1,…,C1000}, C543 best with 0 errorsWhich analysts are you most confident in, A1, B2, or C543?Useful Formula• Hoeffding/Chernoff Bound:For any distribution P(X) where X can take the values 0 and 1, the probability that an average of an i.i.d. sample deviates from its mean p by more than ε is bounded asGeneralization Error Bound: Finite H, Non-Zero Training Error• Model and Learning Algorithm– Sample of n labeled examples Dtrain– Unknown (random) fraction of examples in Dtrainis mislabeled (noise)– Learning Algorithm A with a finite hypothesis space H– A returns hypothesis ĥ=A(S) with lowest training error • What is the probability δthat the prediction error of ĥ exceeds the fraction of training errors by more thanε?(x1,y1), …, (xn,yn)Learner(xn+1,yn+1), …Training Sample DtrainTest Sample DtestDtrainhˆOverfitting vs. Underfitting[Mitchell, 1997]With probability at least (1-δ):3Generalization Error Bound:Infinite H, Non-Zero Training Error• Model and Learning Algorithm– Sample of n labeled examples Dtrain– Learning Algorithm A with a hypothesis space H with VCDim(H)=d– A returns hypothesis ĥ=A(S) with lowest training error •• Given hypothesis space H with VCDim(H) equal to d and a training sample Dtrainof size n, with probability at least (1-δ)it holds thatThis slide is not relevant for


View Full Document

CORNELL CS 472 - Foundations of Artificial Intelligence

Download Foundations of Artificial Intelligence
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Foundations of Artificial Intelligence and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Foundations of Artificial Intelligence 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?