Slide 1Slide 2Slide 3A learning problem!If you prefer the training data in this form!Slide 6How to find a good approximation to f?Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Steps in Supervised LearningFeature Engineering is the KeyA sample machine learning AlgorithmToy ExampleHow do machine learners solve this problem?Toy Example: More dataLearning = Representation + Evaluation + OptimizationSupervised (Inductive) LearningThe University of Texas at DallasInduction: A process of reasoning (arguing) which infers a general conclusion based on individual casesA learning problem!X 0 X0 X 00 X XX 0 XX X 0X 0 0X X X0 X X0 0 0f(x)=10 X 0X 0 X0 X X0 0 XX X 00 X X0 X XX 0 00 X Xf(x)=00 X X0 X 0X X 0f(x)=?If you prefer the training data in this form!X1 X2 X3 X4 X5 X6 X7 X8 X9 f(x)X 0 X 0 X 0 0 X X 1X 0 X X X 0 X 0 0 1X X X 0 X X 0 0 0 10 X 0 X 0 X 0 X X 00 0 X X X 0 0 X X 00 X X X 0 0 0 X X 00 X X 0 X 0 X X 0 ?•x: a 9-dimensional vector•f(x): a function or a program that takes the vector as input and outputs either a 0 or a 1•Task: given the training examples, find a good approximation to f so that in future if you see an unseen vector “x” you will be able to figure out the value of f(x)Given data or examples, find the function f?Classification problemA simpler example for analysis!How to find a good approximation to f?•A possible/plausible techniqueUnknown function Unknown function Training Examples/DataTraining Examples/DataHypothesis space Hypothesis space Learning algorithmLearning algorithmA good approximation:A good approximation:Unknown function Unknown function Training Examples/DataTraining Examples/DataHypothesis space Hypothesis space Learning algorithmLearning algorithmA good approximation:A good approximation:Set of candidate functions(Your assumptions about f)You are assuming that the unknown function f could be any one of the functions! It turns out that out of the possible functions, classify all points in the training data correctly!You are assuming that the unknown function f could be any one of the 16 conjunctive rules!Unfortunately, none of them workAt least m of the n variables must be trueYou are assuming that the unknown function f could be any one of the 32 m-of-n rules!Only one of them, the one marked by “***” works!Steps in Supervised Learning1. Determine the representation for “x,f(x)” and determine what “x” to useFeature Engineering2. Gather a training set (not all data is kosher)Data Cleaning3. Select a suitable evaluation method4. Find a suitable learning algorithm among a plethora of available choices–Issues discussed on the previous slideFeature Engineering is the Key•Most effort in ML projects is constructing features•Black art: Intuition, creativity required–Understand properties of the task at hand–How the features interact with or limit the algorithm you are using.•ML is an iterative process–Try different types of features, experiment with each and then decide which feature set/algorithm combination to useA sample machine learning Algorithm•2-way classification problem–+ve and –ve classes•Representation: Lines (Ax+By=C)–Specifically•if Ax+By+C >0 then classify “+ve”•Else classify as “-ve”•Evaluation: Number of mis-classified examples•Optimization: An algorithm that searches for the three parameters: A, B and C.Toy ExamplexYBlue circles: Good credit (low risk)Red circles: Bad credit (high risk)IncomeAgeProblem: Fit a line that separates the two such that the error is minimized.2040608015K30K45K60KAx+By+C>0Ax+By+C<0Ax+By+C=0How do machine learners solve this problem?•Try different lines until you find one that separates the data into two•A more plausible alternative–Begin with a random line–Repeat until no errors–For each point•If the current line says +ve and point is –ve then decrease A, B and C•If the current line says –ve and the point is +ve then increase A, B, and CToy Example: More dataBlue circles: Good credit (low risk)Red circles: Bad credit (high risk)Problem: Fit a line that separates the two such that the error is minimized.x1x2IncomeAgeLearning = Representation + Evaluation + Optimization•Combinations of just three elementsRepresentation Evaluation OptimizationInstances Accuracy Greedy searchHyperplanes Precision/Recall Branch & boundDecision trees Squared error Gradient descentSets of rules Likelihood Quasi-NewtonNeural networks Posterior prob. Linear progr.Graphical models Margin Quadratic progr.Etc. Etc.
View Full Document