Slide 1Recap: Naïve Bayes classifiersThe rest of text classificationRecall: Vector Space RepresentationClassification Using Vector SpacesDocuments in a Vector SpaceTest Document of what class?Test Document = GovernmentAside: 2D/3D graphs can be misleadingUsing Rocchio for text classificationIllustration of Rocchio Text CategorizationDefinition of centroidRocchio PropertiesRocchio AnomalyRocchio classificationk Nearest Neighbor ClassificationExample: k=6 (6NN)Nearest-Neighbor Learning AlgorithmkNN Is Close to Optimalk Nearest NeighborkNN decision boundariesSimilarity MetricsIllustration of 3 Nearest Neighbor for Text Vector Space3 Nearest Neighbor vs. RocchioNearest Neighbor with Inverted IndexkNN: DiscussionkNN vs. Naive BayesBias vs. variance: Choosing the correct model capacityLinear classifiers and binary and multiclass classificationSeparation by HyperplanesLinear programming / PerceptronWhich Hyperplane?Slide 33Linear classifier: ExampleLinear ClassifiersRocchio is a linear classifierTwo-class Rocchio as a linear classifierNaive Bayes is a linear classifierA nonlinear problemHigh Dimensional DataMore Than Two ClassesSet of Binary Classifiers: Any ofSet of Binary Classifiers: One ofSummary: Representation of Text Categorization AttributesWhich classifier do I use for a given text classification problem?Resources for today’s lectureIntroduction to Information RetrievalIntroduction to Information Retrieval Introduction toInformation RetrievalCS276: Information Retrieval and Web SearchPandu Nayak and Prabhakar RaghavanLecture 11: Text Classification;Vector space classification[Borrows slides from Ray Mooney]Introduction to Information RetrievalIntroduction to Information Retrieval Recap: Naïve Bayes classifiersClassify based on prior weight of class and conditional parameter for what each word says:Training is done by counting and dividing:Don’t forget to smooth2€ cNB= argmaxcj∈Clog P(cj) + log P(xi| cj)i∈positions∑ ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥€ P(cj) ←NcjN€ P(xk| cj) ←Tcjxk+ α[Tcjxi+ α ]xi∈V∑Introduction to Information RetrievalIntroduction to Information Retrieval 3The rest of text classificationToday:Vector space methods for Text ClassificationVector space classification using centroids (Rocchio)K Nearest NeighborsDecision boundaries, linear and nonlinear classifiersDealing with more than 2 classesLater in the courseMore text classificationSupport Vector MachinesText-specific issues in classificationIntroduction to Information RetrievalIntroduction to Information Retrieval 4Recall: Vector Space RepresentationEach document is a vector, one component for each term (= word).Normally normalize vectors to unit length.High-dimensional vector space:Terms are axes10,000+ dimensions, or even 100,000+Docs are vectors in this spaceHow can we do classification in this space?Sec.14.1Introduction to Information RetrievalIntroduction to Information Retrieval 5Classification Using Vector SpacesAs before, the training set is a set of documents, each labeled with its class (e.g., topic)In vector space classification, this set corresponds to a labeled set of points (or, equivalently, vectors) in the vector spacePremise 1: Documents in the same class form a contiguous region of spacePremise 2: Documents from different classes don’t overlap (much)We define surfaces to delineate classes in the spaceSec.14.1Introduction to Information RetrievalIntroduction to Information Retrieval 6Documents in a Vector SpaceGovernmentScienceArtsSec.14.1Introduction to Information RetrievalIntroduction to Information Retrieval 7Test Document of what class?GovernmentScienceArtsSec.14.1Introduction to Information RetrievalIntroduction to Information Retrieval 8Test Document = GovernmentGovernmentScienceArtsIs this similarityhypothesistrue ingeneral?Our main topic today is how to find good separatorsSec.14.1Introduction to Information RetrievalIntroduction to Information Retrieval Aside: 2D/3D graphs can be misleading9Sec.14.1Introduction to Information RetrievalIntroduction to Information Retrieval Using Rocchio for text classificationRelevance feedback methods can be adapted for text categorizationAs noted before, relevance feedback can be viewed as 2-class classificationRelevant vs. nonrelevant documentsUse standard tf-idf weighted vectors to represent text documents For training documents in each category, compute a prototype vector by summing the vectors of the training documents in the category.Prototype = centroid of members of classAssign test documents to the category with the closest prototype vector based on cosine similarity.10Sec.14.2Introduction to Information RetrievalIntroduction to Information Retrieval Illustration of Rocchio Text Categorization11Sec.14.2Introduction to Information RetrievalIntroduction to Information Retrieval Definition of centroidWhere Dc is the set of all documents that belong to class c and v(d) is the vector space representation of d.Note that centroid will in general not be a unit vector even when the inputs are unit vectors.12 € r μ (c) =1| Dc|r v (d)d ∈Dc∑Sec.14.2Introduction to Information RetrievalIntroduction to Information Retrieval 13Rocchio Properties Forms a simple generalization of the examples in each class (a prototype).Prototype vector does not need to be averaged or otherwise normalized for length since cosine similarity is insensitive to vector length.Classification is based on similarity to class prototypes.Does not guarantee classifications are consistent with the given training data. Why not?Sec.14.2Introduction to Information RetrievalIntroduction to Information Retrieval 14Rocchio Anomaly Prototype models have problems with polymorphic (disjunctive) categories.Sec.14.2Introduction to Information RetrievalIntroduction to Information Retrieval Rocchio classificationRocchio forms a simple representation for each class: the centroid/prototype Classification is based on similarity to / distance from the prototype/centroidIt does not guarantee that classifications are consistent with the given training dataIt is little used outside text classificationIt has been used quite effectively for text classificationBut in general worse than Naïve BayesAgain, cheap to train and test documents15Sec.14.2Introduction to Information
View Full Document