Read Chapter 7 of Machine Learning[Suggested exercises: 7.1, 7.2, 7.5, 7.7]Given:• Instance space X:- e.g. X is set of boolean vectors of length n; x = <0,1,1,0,0,1>• Hypothesis space H: set of functions h: X Æ Y- e.g., H is the set of boolean functions (Y={0,1}) defined by conjunction of constraints on the features of x. • Training Examples D: sequence of positive and negative examples of an unknown target function c: X Æ{0,1} -<x1, c(x1)>, … <xm, c(xm)> Determine:• A hypothesis h in H such that h(x)=c(x) for all x in XFunction ApproximationGiven:• Instance space X:- e.g. X is set of boolean vectors of length n; x = <0,1,1,0,0,1>• Hypothesis space H: set of functions h: X Æ Y- e.g., H is the set of boolean functions (Y={0,1}) defined by conjunctions of constraints on the features of x. • Training Examples D: sequence of positive and negative examples of an unknown target function c: X Æ{0,1} -<x1, c(x1)>, … <xm, c(xm)> Determine:• A hypothesis h in H such that h(x)=c(x) for all x in X• A hypothesis h in H such that h(x)=c(x) for all x in DFunction ApproximationWhat we wantWhat we can observeInstances, Hypotheses, and More-General-Thani.e., minimizes the number of queries needed to converge to the correct hypothesis.Dinstancesdrawn at random from Probability distribution P(x)Set of training examplesDinstancesdrawn at random from Probability distribution P(x)Set of training examplesCan we boundin terms of??DDtrue error lessAny(!) learner that outputs a hypothesis consistent with all training examples (i.e., an h contained in VSH,D)What it means[Haussler, 1988]: probability that the version space is not ε-exhausted after m training examples is at most1. How many training examples suffice?Suppose we want this probability to be at most δ2. If then with probability at least (1-δ):Sufficient condition: Holds if L requires only a polynomial number of training examples, and processing per example is polynomialtrue error training error degree of overfittingAdditive Hoeffding Bounds – Agnostic Learning• Given m independent coin flips of coin with Pr(heads) = θbound the error in the estimate• Relevance to agnostic learning: for any singlehypothesis h• But we must consider all hypotheses in H• So, with probability at least (1-δ) every h satisfiesGeneral Hoeffding Bounds• When estimating parameter θ ∈ [a,b] from m examples• When estimating a probability θ ∈ [0,1], so• And if we’re interested in only one-sided error,
View Full Document