OutlineChapter 6. Classification and PredictionLazy vs. Eager LearningLazy Learner: Instance-Based MethodsThe k-Nearest Neighbor AlgorithmSlide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Fuzzy Set ApproachesSlide 14Slide 15Slide 16Slide 17Fuzzy Set ApplicationsSlide 19Classifier Accuracy MeasuresSlide 21OutlineK-Nearest Neighbor algorithmFuzzy Set theoryClassifier Accuracy MeasuresChapter 6. Classification and PredictionEager Learners: when given a set of training tuples, will construct a generalization model before receiving new tuples to classifyClassification by decision tree inductionRule-based classificationClassification by back propagationSupport Vector Machines (SVM)Associative classificationLazy vs. Eager LearningLazy vs. eager learningLazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tupleEager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classifyLazy: less time in training but more time in predictingLazy Learner: Instance-Based MethodsTypical approachesk-nearest neighbor approachInstances represented as points in a Euclidean space.The k-Nearest Neighbor AlgorithmAll instances correspond to points in the n-D spaceThe nearest neighbor are defined in terms of Euclidean distance, dist(X1, X2)Target function could be discrete- or real- valuedFor discrete-valued, k-NN returns the most common value among the k training examples nearest to xq . _+_xq+_ _+__+The k-Nearest Neighbor Algorithmk-NN for real-valued prediction for a given unknown tupleReturns the mean values of the k nearest neighborsDistance-weighted nearest neighbor algorithmWeight the contribution of each of the k neighbors according to their distance to the query xqGive greater weight to closer neighborsRobust to noisy data by averaging k-nearest neighborsThe k-Nearest Neighbor AlgorithmHow can I determine the value of k, the number of neighbors?In general, the larger the number of training tuples is, the larger the value of k is Nearest-neighbor classifiers can be extremely slow when classifying test tuples O(n)By simple presorting and arranging the stored tuples into search tree, the number of comparisons can be reduced to O(logN)The k-Nearest Neighbor AlgorithmExample:K=5OutlineK-Nearest Neighbor algorithmFuzzy Set theoryClassifier Accuracy MeasuresFuzzy Set ApproachesRule-based systems for classification have the disadvantage that they involve sharp cutoffs for continuous attributesFor example:IF (years_employed>2) AND (income>50K)THEN credit_card=approvedWhat if a customer has 10 years employed and income is 49K?Fuzzy Set ApproachesInstead, we can discretize income into categories such as {low,medium,high}, and then apply fuzzy logic to allow “fuzzy” threshold for each categoryFuzzy Set ApproachesFuzzy theory is also known as possibility theory, it was proposed by Lotif Zadeh in 1965Unlike the notion of traditional “crisp” sets where an element either belongs to a set S, in fuzzy theory, elements can belong to more than one fuzzy setFuzzy Set ApproachesFor example, the income value $49K belongs to both the medium and high fuzzy sets:Mmedium($49K)=0.15 andMhigh($49K)=0.96Fuzzy Set ApproachesAnother example for temperatureFuzzy Set Applicationshttp://www.dementia.org/~julied/logic/applications.htmlOutlineK-Nearest Neighbor algorithmFuzzy Set theoryClassifier Accuracy MeasuresClassifier Accuracy Measuresclasses (Real) buy computer = yes(Real) buy computer = nototal(Predict) buy computer = yes6954 412 7366(Predict) buy computer = no46 2588 2634total 7000 (Buy Computer)3000 (Does not buy Computer)10000Classifier Accuracy MeasuresAlternative accuracy measures (e.g., for cancer diagnosis)sensitivity = t-pos/pos specificity = t-neg/neg precision = t-pos/(t-pos + f-pos)accuracy
View Full Document