DOC PREVIEW
CMU CS 10701 - lecture

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Machine LearningMachine Learning1010--701/15701/15--781, Spring 2008781, Spring 2008Computational Learning TheoryComputational Learning TheoryEric XingEric XingLecture 11, February 20, 2008Reading: Chap. 7 T.M bookComplexity of Learningz The complexity of leaning is measured mainly alongtwo axis::InformationInformation andcomputationcomputation..The Information complexityInformation complexity is concerned with the generalization performance of learning; z How many training examples are needed? z How fast do learner’s estimate converge to the true population parameters? Etc.The Computational complexity Computational complexity concerns the computation resources applied to the training data to extract from it learner’s predictions.It seems that when an algorithm improves with respect to one of these measures it deteriorates with respect to the other.2What General Lawsconstrain Inductive Learning?These results only useful wrtO(…) !z Sample Complexityz How many training examples are sufficient to learn target concept?z Computational Complexityz Resources required to learn target concept?z Want theory to relate:z Training examplesz Quantityz Quality mz How presentedz Complexity of hypothesis/concept space Hz Accuracy of approx to target concept εz Probability of successful learning δPrototypical concept learning taskBinary classificationz Everything we'll say here generalizes to other, including regression and multi-class classification, problems.z Given:z Instances X: Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecastz Target function c: EnjoySport : X → {0, 1}z Hypotheses space H: Conjunctions of literals. E.g. (?, Cold, High, ?, ?, EnjoySport).z Training examples S: iid positive and negative examples of the target function (x1,c(x1)), ... (xm, c(xm))z Determine:z A hypothesis h in H such that h(x) is "good" w.r.t c(x) for all x in S?z A hypothesis h in H such that h(x) is "good" w.r.t c(x) for all x in the true dist D?3Sample labels are consistentwith some hh in HHLearner’s hypothesis required to meet absolute upper boundon its errorNo prior restriction on the sample labelsThe required upper bound on the hypothesis error is only relative (to the best hypothesis in the class)PAC framework Agnostic frameworkTwo Basic Competing ModelsSample Complexityz How many training examples are sufficient to learn the target concept?z Training senarios:1 If learner proposes instances, as queries to teacherz Learner proposes instance x, teacher provides c(x)2 If teacher (who knows c) provides training examplesz teacher provides sequence of examples of form (x, c(x))3 If some random process (e.g., nature) proposes instancesz instance x generated randomly, teacher provides c(x)4ProtocolLearnerNNYSo…………NoPale8710::::YesClear11022NoPale9535diseaseXColourPress.Temp.ClassifierPale…N9032Color…Sore-ThroatPress.TempNodiseaseXz Given: z set of examples Xz fixed (unknown) distribution D over Xz set of hypotheses Hz set of possible target concepts Cz Learner observes sample S = { 〈 xi, c(xi) 〉 }z instances xidrawn from distr. Dz labeled by target concept c ∈ C(Learner does NOT know c(.), D)z Learner outputs h ∈ H estimating cz h is evaluated by performance on subsequent instances drawn from Dz For now: z C = H (so c ∈ H)z Noise-free dataTrue error of a hypothesisz Definition: The true error (denoted εD(h)) of hypothesis h with respect to target concept c and distribution Dis the probability that h will misclassify an instance drawn at random according to D.5Two notions of errorz Training error of hypothesis h with respect to target concept cz How often h(x) ≠ c(x) over training instance from Sz True error of hypothesis h with respect to cz How often h(x) ≠ c(x) over future random instances drew iid from DCan we boundin terms of??The Union Boundz Lemma. (The union bound). Let A1;A2, … , Akbe k different events (that may not be independent). Thenz In probability theory, the union bound is usually stated as an axiom (and thus we won't try to prove it), but it also makes intuitive sense: The probability of any one of k events happening is at most the sums of the probabilities of the k different events.6Hoeffding inequalityz Lemma. (Hoeding inequality) Let Z1,…,Zmbe m independent and identically distributed (iid) random variables drawn from a Bernoulli(φ) distribution, i.e., P(Zi= 1) =φ, and P(Zi= 0) = 1-φ. Let be the mean of these random variables, and let any γ> 0 be fixed. Thenz This lemma (which in learning theory is also called the Chernoff bound) says that if we take ⎯ the average of m Bernoulli( ) random variables ⎯ to be our estimate of , then the probability of our being far from the true value is small, so long as m is large.Version Spacez A hypothesis h is consistent with a set of training examples Sof target concept c if and only if h(x)=c(x) for each training example 〈 xi, c(xi) 〉 in Sz The version space, VSH,S , with respect to hypothesis space Hand training examples Sis the subset of hypotheses from Hconsistent with all training examples in S.7z Definition: The version space VSH,Sis said to be ε-exhausted with respect to c and S, if every hypothesis h in VSH,Shas error less than εwith respect to c and S.Exhausting the version spaceProbably Approximately CorrectGoal:PAC-Learner produces hypothesis ĥthatis approximately correct,errD(ĥ) ≈ 0with high probabilityP( errD(ĥ) ≈ 0 ) ≈ 1z Double “hedging"z approximatelyz probablyNeed both!8How many examples will ε-exhaust the VSTheorem: [Haussler, 1988].z If the hypothesis space H is finite, and S is a sequence of m ≥ 1 independent random examples of some target concept c, then for any 0 ≤ε≤1, the probability that the version space with respect to Hand D is not ε-exhausted (with respect to c) is less thanz This bounds the probability that any consistent learner will output a hypothesis h with ε(h) ≥εz If we want to this probability to be below δz then|H|e−²m))/ln((lnδε11+≥ Hm|H|e−²m≤ δProof9What it meansz [Haussler, 1988]: probability that the version space is not ε-exhausted after m training examples is at most |H|e-εmSuppose we want this probability to be at most δ1. How many training examples suffice?2. If errortrain(h) = 0 then with probability at least (1-δ):z How many examples are sufficient to assure with probability at least (1 - δ) thatevery h in VSH,Ssatisfies εS(h) ≤εz


View Full Document

CMU CS 10701 - lecture

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?