CMU CS 10701 - lecture - D1913978

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> lecture

DOC PREVIEW

CMU CS 10701 - lecture

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning 1010 701 15701 15 781 Spring 2008 Computational Learning Theory Eric Xing Lecture 11 February 20 2008 Reading Chap 7 T M book Complexity of Learning z The complexity of leaning is measured mainly along two axis Information and computation computation The Information complexity is concerned with the generalization performance of learning z How many training examples are needed z How fast do learner s estimate converge to the true population parameters Etc The Computational complexity concerns the computation resources applied to the training data to extract from it learner s predictions It seems that when an algorithm improves with respect to one of these measures it deteriorates with respect to the other 1 What General Laws constrain Inductive Learning z Sample Complexity z z Computational Complexity z z How many training examples are sufficient to learn target concept Resources required to learn target concept Want theory to relate z Training examples z Quantity z Quality z How presented Th m z Complexity of hypothesis concept space H z Accuracy of approx to target concept z Probability of successful learning ese r esu lts on ly us efu l wr tO Prototypical concept learning task Binary classification z z Everything we ll say here generalizes to other including regression and multiclass classification problems Given z Instances X Possible days each described by the attributes Sky AirTemp Humidity Wind Water Forecast z Target function c EnjoySport X 0 1 z Hypotheses space H Conjunctions of literals E g z Training examples S iid positive and negative examples of the target function Cold High EnjoySport x1 c x1 xm c xm z Determine z A hypothesis h in H such that h x is good w r t c x for all x in S z A hypothesis h in H such that h x is good w r t c x for all x in the true dist D 2 Two Basic Competing Models PAC framework Agnostic framework Sample labels are consistent with some h in H No prior restriction on the sample labels Learner s hypothesis required to meet absolute upper bound on its error The required upper bound on the hypothesis error is only relative to the best hypothesis in the class Sample Complexity z How many training examples are sufficient to learn the target concept z Training senarios 1 If learner proposes instances as queries to teacher z 2 If teacher who knows c provides training examples z 3 Learner proposes instance x teacher provides c x teacher provides sequence of examples of form x c x If some random process e g nature proposes instances z instance x generated randomly teacher provides c x 3 Te mp Pre ss Y Pale N Clear N Pale No 10 87 No Yes Learner Given z z disea seX 95 110 Protocol z Colo ur So 35 22 set of examples X z fixed unknown distribution D over X z set of hypotheses H z set of possible target concepts C Te mp P r e s s 3 2 9 0 Sor eThr oat N C ol o r P a l e Classifier disease X No Learner observes sample S xi c xi z instances xi drawn from distr D z labeled by target concept c C Learner does NOT know c D z Learner outputs h H estimating c z z h is evaluated by performance on subsequent instances drawn from D For now z C H so c H z Noise free data True error of a hypothesis z Definition The true error denoted D h of hypothesis h with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D 4 Two notions of error z Training error of hypothesis h with respect to target concept c z z How often h x c x over training instance from S True error of hypothesis h with respect to c z How often h x c x over future random instances drew iid from D Can we bound in terms of The Union Bound z Lemma The union bound Let A1 A2 Ak be k different events that may not be independent Then z In probability theory the union bound is usually stated as an axiom and thus we won t try to prove it but it also makes intuitive sense The probability of any one of k events happening is at most the sums of the probabilities of the k different events 5 Hoeffding inequality z Lemma Hoeding inequality Let Z1 Zm be m independent and identically distributed iid random variables drawn from a Bernoulli distribution i e P Zi 1 and P Zi 0 1 Let be the mean of these random variables and let any 0 be fixed Then z This lemma which in learning theory is also called the Chernoff bound says that if we take the average of m Bernoulli random variables to be our estimate of then the probability of our being far from the true value is small so long as m is large Version Space z z A hypothesis h is consistent with a set of training examples S of target concept c if and only if h x c x for each training example xi c xi in S The version space VSH S with respect to hypothesis space H and training examples S is the subset of hypotheses from H consistent with all training examples in S 6 Exhausting the version space z Definition The version space VSH S is said to be exhausted with respect to c and S if every hypothesis h in VSH S has error less than with respect to c and S Probably Approximately Correct Goal PAC Learner produces hypothesis that is approximately correct errD 0 with high probability P errD 0 1 z Double hedging z approximately z probably Need both 7 How many examples will exhaust the VS Theorem Haussler 1988 z If the hypothesis space H is finite and S is a sequence of m 1 independent random examples of some target concept c then for any 0 1 the probability that the version space with respect to H and D is not exhausted with respect to c is less than H e m z z This bounds the probability that any consistent learner will output a hypothesis h with h If we want to this probability to be below H e m z then m 1 ln H ln 1 Proof 8 What it means z Haussler 1988 probability that the version space is not exhausted after m training examples is at most H e m Suppose we want this probability to be at most 1 2 How many training examples suffice If errortrain h 0 then with probability at least 1 Learning Conjunctions of Boolean Literals z How many examples are sufficient to assure with probability at least 1 that every h in VSH S satisfies S h z Use our theorem z Suppose H contains conjunctions of constraints on up to n boolean attributes i e n boolean literals m 1 …

View Full Document

CMU CS 10701 - lecture

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

CMU CS 10701 - lecture

Sign up for free to view:

Please select your school