Computational Learning Theory`pgyRdiReading: • Mitchell chapter 7Suggested exercises:Suggested exercises:• 7.1, 7.2, 7.5, 7.7Machine Learning 10-601Tom M. MitchellMachine Learning DepartmentCarnegie Mellon UniversityFebruary 18, 2008February 18, 2008Instances, Hypotheses, and More-General-ThanDSet of training examplesinstancesdrawn at random from Probability Probability distribution P(x)Can we boundin terms ofDDD??Set of training examplesinstancesdrawn at random from Probability Probability distribution P(x)Target concept is the (usually the (usually unknown) boolean fn to be learnedc: X Æ{0,1}c X {0, }ltrue error lessAny(!) learner that outputs a hypothesis ypconsistent with all training examples examples (i.e., an h contained in VSH,D)What it means[Haussler, 1988]: probability that the version space is not ε-exhausted after m training examples is at most1 How many training examples suffice?Suppose we want this probability to be at most δ1. How many training examples suffice?2. If then with probability at least (1-δ):E.g.,X=< X1, X2, ... Xn >Each h ∈ H constrains each Xi to be 1, 0, or “don’t care”In other words, each h is a rule such as:is a rule such as:If X2=0 and X5=1Then Y=1, else Y=0Sufficient condition: uff n n nHolds if L requires only a polynomial number of training gexamples, and processing per example is polynomialnote ε here is the difference between the training error and true errortrue error training error degree of overfittingAdditive Hoeffding Bounds – Agnostic Learning• Given m independent coin flips of coin with Pr(heads) = θbound the error in the estimate• Relevance to agnostic learning: for any single hypothesis h• But we must consider all hypotheses in H• So, with probability at least (1-δ) every h satisfiesGeneral Hoeffding Bounds• When estimating parameter θ∈[a,b] from m examplesWh ti ti b bilitθ[0 1]•When estimating a probability θ∈[0,1], so• And if we’re interested in only one-sided error,
View Full Document