Machine Learning,Function Approximation and Version SpacesMachine Learning 10-701Tom M. MitchellCenter for Automated Learning and DiscoveryCarnegie Mellon UniversityJanuary 10, 2005Recommended reading: Mitchell, Chapter 2Machine Learning:Study of algorithms that• improve their performance• at some task• with experienceLearning to Predict Emergency C-Sections9714 patient records, each with 215 features[Sims et al., 2000]Object DetectionExample training images for each orientation(Prof. H. Schneiderman)Text ClassificationCompany home pagevsPersonal home pagevsUniveristy home pagevs…Reading a noun (vs verb)[Rustandi et al., 2005]Growth of Machine Learning• Machine learning is preferred approach to– Speech recognition, Natural language processing– Computer vision– Medical outcomes analysis– Robot control–…• This trend is accelerating– Improved machine learning algorithms – Improved data capture, networking, faster computers– Software too complex to write by hand– New sensors / IO devices– Demand for self-customization to user, environmentC: < Sky, Temp, Humid, Wind, Water, Forecst > Æ EnjoySptGiven:• Instances X:- e.g. x = <0,1,1,0,0,1>• Hypotheses H: set of functions h: X Æ{0,1}- e.g., H is the set of all boolean functions defined by conjunctions of constraints on the features of x. (such as <0,1,?,?,?,1> Æ 1)• Training Examples D: sequence of positive and negative examples of an unknown target function c: X Æ{0,1} -<x1, c(x1)>, … <xm, c(xm)> Determine:• A hypothesis h in H such that h(x)=c(x) for all x in XFunction ApproximationGiven:• Instances X:- e.g. x = <0,1,1,0,0,1>• Hypotheses H: set of functions h: X Æ{0,1}- e.g., H is the set of all boolean functions defined by conjunctions of constraints on the features of x. (such as <0,1,?,?,?,1> Æ 1)• Training Examples D: sequence of positive and negative examples of an unknown target function c: X Æ{0,1} -<x1, c(x1)>, … <xm, c(xm)> Determine:• A hypothesis h in H such that h(x)=c(x) for all x in X• A hypothesis h in H such that h(x)=c(x) for all x in DFunction ApproximationWhat we wantWhat we can observeHere draw instance space, hypothesis space figureInstances, Hypotheses, and More-General-ThanSimplifying Assumptions for today (only)• Target function c is deterministic • Target function c is contained in hypotheses H• Training data is error-free, noise-freeProblems with Find-S• Finds just one of the many h’s in H that fit the training data– the most specific one • Can’t determine when learning has converged to the final hVersion Space for our EnjoySport problemVersion Space Candidate Elimination Algorithm• Initialize S (G) to maximally specific (general) h’s in H• For each training example <x,c(x)>– if positive example <x,1>• Generalize S as much as needed to cover x, in all possible ways• Remove any h є G, for which h(x)≠1– if negative example <x,0>• Specialize G as much as needed to exclude x, in all possible ways• Remove any h є S for which h(x)=1– Retain only members of G that are more general than some member of S– Retain only members of S that are more general than some member of GMatches NO instancesVersion Space after all four examplesMachine Translation Example [Probst et al., 2003]Seeded VS Learning [Probst et al., 2003]:Construct VS around a seed positive example.Include only hypotheses at a predetermined level of generalization, ± klevels in the partial order.?What you should know:• Well posed function approximation problem:– Instance space, X– Hypothesis space, H– Sample of training data, D• Learning as search/optimization over H– Various objective functions• Sample complexity of learning– How many examples needed to converge?– Depends on H, how examples generated, notion of convergence• Biased and unbiased learners– Futility of unbiased
View Full Document