This preview shows page 1-2-3-4-5-6-7-8-9-10-11-78-79-80-81-82-83-84-85-86-87-88-156-157-158-159-160-161-162-163-164-165-166 out of 166 pages.
Chapter 3: Supervised LearningRoad MapAn example applicationAnother applicationMachine learning and our focusThe data and the goalAn example: data (loan application)An example: the learning taskSupervised vs. unsupervised LearningSupervised learning process: two stepsWhat do we mean by learning?An exampleFundamental assumption of learningSlide 14IntroductionThe loan data (reproduced)A decision tree from the loan dataUse the decision treeIs the decision tree unique?From a decision tree to a set of rulesAlgorithm for decision tree learningDecision tree learning algorithmChoose an attribute to partition dataSlide 24Two possible roots, which is better?Information theoryInformation theory (cont …)Information theory: Entropy measureEntropy measure: let us get a feelingInformation gainInformation gain (cont …)Slide 32We build the final treeHandling continuous attributesAn example in a continuous spaceAvoid overfitting in classificationSlide 37Other issues in decision tree learningSlide 39Evaluating classification methodsEvaluation methodsEvaluation methods (cont…)Slide 43Slide 44Classification measuresPrecision and recall measuresPrecision and recall measures (cont…)Slide 48F1-value (also called F1-score)Receive operating characteristics curveSensitivity and SpecificityExample ROC curvesArea under the curve (AUC)Drawing an ROC curveAnother evaluation method: Scoring and rankingRanking and lift analysisSlide 57Slide 58Lift curveSlide 60Slide 61Sequential coveringAlgorithm 1: ordered rulesAlgorithm 2: ordered classesAlgorithm 1 vs. Algorithm 2Learn-one-rule-1 functionLearn-one-rule-1 function (cont …)Learn-one-rule-1 algorithmLearn-one-rule-2 functionLearn-one-rule-2 algorithmRule evaluation in learn-one-rule-2Rule pruning in learn-one-rule-2DiscussionsSlide 74Three approachesUsing Class Association RulesDecision tree vs. CARsThere are many more rulesDecision tree vs. CARs (cont …)Considerations in CAR miningBuilding classifiersCBA: Rules are sorted firstClassifier building using CARsUsing rules as featuresUsing normal association rules for classificationSlide 86Bayesian classificationApply Bayes’ RuleComputing probabilitiesConditional independence assumptionFinal naïve Bayesian classifierClassify a test instanceSlide 93An Example (cont …)Additional issuesOn naïve Bayesian classifierSlide 97Text classification/categorizationProbabilistic frameworkMixture modelSlide 101Mixture model (cont …)Document generationModel text documentsMultinomial distributionUse probability function of multinomial distributionParameter estimationParameter estimation (cont …)ClassificationSlide 110Slide 111Slide 112Basic conceptsThe hyperplaneMaximal margin hyperplaneLinear SVM: separable caseCompute the marginCompute the margin (cont …)A optimization problem!Solve the constrained minimizationKuhn-Tucker conditionsSolve the problemDual formulationDual optimization prolemThe final decision boundaryLinear SVM: Non-separable caseRelax the constraintsGeometric interpretationPenalize errors in objective functionNew optimization problemKuhn-Tucker conditionsFrom primal to dualDualFind primal variable values(65), (70) and (71) in fact tell us moreSlide 136How to deal with nonlinear separation?Space transformationSlide 139Optimization problem in (61) becomesAn example space transformationProblem with explicit transformationKernel functionsAn example kernel functionKernel trickIs it a kernel function?Commonly used kernelsSome other issues in SVMSlide 149k-Nearest Neighbor Classification (kNN)kNNAlgorithmExample: k=6 (6NN)Slide 153Slide 154Combining classifiersPowerPoint PresentationSlide 157Bagging ExampleBagging (cont …)BoostingAdaBoostAdaBoost algorithmBagging, Boosting and C4.5Does AdaBoost always work?Slide 165SummaryChapter 3: Supervised LearningCS583, Bing Liu, UIC2Road MapBasic conceptsDecision tree inductionEvaluation of classifiersRule inductionClassification using association rulesNaïve Bayesian classificationNaïve Bayes for text classificationSupport vector machinesK-nearest neighborEnsemble methods: Bagging and BoostingSummaryCS583, Bing Liu, UIC3An example applicationAn emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether to put a new patient in an intensive-care unit. Due to the high cost of ICU, those patients who may survive less than a month are given higher priority. Problem: to predict high-risk patients and discriminate them from low-risk patients.CS583, Bing Liu, UIC4Another applicationA credit card company receives thousands of applications for new cards. Each application contains information about an applicant, age Marital statusannual salaryoutstanding debtscredit ratingetc. Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved.CS583, Bing Liu, UIC5Machine learning and our focusLike human learning from past experiences.A computer does not have “experiences”.A computer system learns from data, which represent some “past experiences” of an application domain. Our focus: learn a target function that can be used to predict the values of a discrete class attribute, e.g., approve or not-approved, and high-risk or low risk. The task is commonly called: Supervised learning, classification, or inductive learning.CS583, Bing Liu, UIC6Data: A set of data records (also called examples, instances or cases) described byk attributes: A1, A2, … Ak. a class: Each example is labelled with a pre-defined class. Goal: To learn a classification model from the data that can be used to predict the classes of new (future, or test) cases/instances.The data and the goalCS583, Bing Liu, UIC7An example: data (loan application)Approved or notCS583, Bing Liu, UIC8An example: the learning taskLearn a classification model from the data Use the model to classify future loan applications into Yes (approved) and No (not approved)What is the class for following case/instance?CS583, Bing Liu, UIC9Supervised vs. unsupervised LearningSupervised learning: classification is seen as supervised learning from examples. Supervision: The data (observations, measurements, etc.) are labeled with pre-defined classes. It is like that a “teacher” gives the classes (supervision). Test data are classified into these classes
View Full Document