Statistical ClassificationClassification ProblemsExamples of Classification ProblemSlide 4Slide 5Slide 6Slide 7Learning from ExamplesSlide 9Slide 10K Nearest Neighbor (kNN) ApproachCross ValidationLeave-One-Out MethodSlide 14Slide 15Slide 16Slide 17Slide 18Probabilistic interpretation of KNNWeighted kNNEstimate 2 in the Weight FunctionSlide 22Slide 23Slide 24Slide 25OptimizationChallenges in OptimizationGradient AscentGradient Ascent (cont’d)Gradient Ascent: Line SearchSlide 31Slide 32ML = Statistics + OptimizationInstance-Based Learning (Ch. 8)When to Consider Nearest Neighbor ?KD Tree for NN SearchNN Search by KD TreeSlide 38Slide 39Slide 40Slide 41Slide 42Slide 43Curse of DimensionalitySlide 45Statistical ClassificationRong JinClassification ProblemsYXf :XInputYOutput?• Given input X={x1, x2, …, xm}• Predict the class label y Y• Y = {-1,1}, binary class classification problems• Y = {1, 2, 3, …, c}, multiple class classification problems• Goal: need to learn the function: f: X YExamples of Classification ProblemText categorization:Input features X:Word frequency{(campaigning, 1), (democrats, 2), (basketball, 0), …}Class label y: Y = +1: ‘politics’Y = -1: ‘non-politics’Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, …Topic:PoliticsNon-politicsExamples of Classification ProblemText categorization:Input features X:Word frequency{(campaigning, 1), (democrats, 2), (basketball, 0), …}Class label y: Y = +1: ‘politics’Y = -1: ‘not-politics’Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, …Topic:PoliticsNon-politicsExamples of Classification ProblemImage Classification:Input features XColor histogram{(red, 1004), (red, 23000), …}Class label yY = +1: ‘bird image’ Y = -1: ‘non-bird image’Which images are birds, which are not?Examples of Classification ProblemImage Classification:Input features XColor histogram{(red, 1004), (blue, 23000), …}Class label yY = +1: ‘bird image’ Y = -1: ‘non-bird image’Which images are birds, which are not?Classification ProblemsYXf :XInputYOutput?Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, …Politics Not-politicsf: doc topicBirds Not-Birdsf: image topicHow to obtain f ?Learn classification function f from examplesLearning from ExamplesTraining examples: Identical Independent Distribution (i.i.d.)Each training example is drawn independently from the identical sourceTraining examples are similar to testing examples( ) ( ) ( ){ }( )1 1 2 2,1 ,2 ,, , , ,..., ,: the number of training examples: , ,...,: Binary class: { 1,-1} Multiple class: {1, 2,..., }train n ndi i i i diD x y x y x ynx x x xyc=���= +=YYYr r rrLearning from ExamplesTraining examples: Identical Independent Distribution (i.i.d.)Each training example is drawn independently from the identical source( ) ( ) ( ){ }( )1 1 2 2,1 ,2 ,, , , ,..., ,: the number of training examples: , ,...,: Binary class: { 1,-1} Multiple class: {1, 2,..., }train n ndi i i i diD x y x y x ynx x x xyc=���= +=YYYr r rrLearning from ExamplesGiven training examplesGoal: learn a classification function f(x):XY that is consistent with training examplesWhat is the easiest way to do it ?( ) ( ) ( ){ }1 1 2 2, , , ,..., ,train n nD x y x y x y=r r rK Nearest Neighbor (kNN) Approach (k=1)(k=4) How many neighbors should we count ?Cross ValidationDivide training examples into two setsA training set (80%) and a validation set (20%)Predict the class labels of the examples in the validation set by the examples in the training setChoose the number of neighbors k that maximizes the classification accuracyLeave-One-Out MethodFor k = 1, 2, …, KErr(k) = 0;1. Randomly select a training data point and hide its class label2. Using the remaining data and given K to predict the class label for the left data point3. Err(k) = Err(k) + 1 if the predicted label is different from the true labelRepeat the procedure until all training examples are testedChoose the k whose Err(k) is minimalLeave-One-Out MethodFor k = 1, 2, …, KErr(k) = 0;1. Randomly select a training data point and hide its class label2. Using the remaining data and given K to predict the class label for the left data point3. Err(k) = Err(k) + 1 if the predicted label is different from the true labelRepeat the procedure until all training examples are testedChoose the k whose Err(k) is minimalLeave-One-Out MethodFor k = 1, 2, …, KErr(k) = 0;1. Randomly select a training data point and hide its class label2. Using the remaining data and given k to predict the class label for the left data point3. Err(k) = Err(k) + 1 if the predicted label is different from the true labelRepeat the procedure until all training examples are testedChoose the k whose Err(k) is minimal(k=1)Leave-One-Out MethodFor k = 1, 2, …, KErr(k) = 0;1. Randomly select a training data point and hide its class label2. Using the remaining data and given k to predict the class label for the left data point3. Err(k) = Err(k) + 1 if the predicted label is different from the true labelRepeat the procedure until all training examples are testedChoose the k whose Err(k) is minimal(k=1)Err(1) = 1Leave-One-Out MethodFor k = 1, 2, …, KErr(k) = 0;1. Randomly select a training data point and hide its class label2. Using the remaining data and given k to predict the class label for the left data point3. Err(k) = Err(k) + 1 if the predicted label is different from the true labelRepeat the procedure until all training examples are testedChoose the k whose Err(k) is minimalErr(1) = 1Leave-One-Out MethodFor k = 1, 2, …, KErr(k) = 0;1. Randomly select a training data point and hide its class label2. Using the remaining data and given k to predict the class label for the left data point3. Err(k) = Err(k) + 1 if the predicted label is different from the true labelRepeat the procedure until all training examples are testedChoose the k whose Err(k) is minimalErr(1) = 3Err(2) = 2Err(3) = 6k = 2Probabilistic interpretation of KNNEstimate the probability density function Pr(y|x) around the location of xCount of data points in class y in the neighborhood of xBias and variance tradeoffA small neighborhood
View Full Document