Stanford CS 276B - Lecture 4 - D1029470

Home> Schools> Stanford University> Computer Science (CS) > CS 276B> Lecture 4

Stanford CS 276B - Lecture 4

Course Cs 276b- Text Information Retrieval, Mining, and Exploitation

Pages 46

Download Save

Unformatted text preview:

CS276B Text Information Retrieval, Mining, and ExploitationIs this spam?Categorization/ClassificationDocument ClassificationText Categorization ExamplesMethods (1)Methods (2)Text Categorization: attributesBayesian MethodsBayes’ RuleMaximum a posteriori HypothesisMaximum likelihood HypothesisNaive Bayes ClassifiersNaïve Bayes Classifier: AssumptionsThe Naïve Bayes ClassifierLearning the ModelProblem with Max LikelihoodSmoothing to Avoid OverfittingUsing Naive Bayes Classifiers to Classify Text: Basic methodText Classification Algorithms: LearningText Classification Algorithms: ClassifyingNaive Bayes Time ComplexityUnderflow PreventionNaïve Bayes Posterior ProbabilitiesTwo ModelsSlide 26Parameter estimationFeature selection via Mutual InformationFeature selection via MI (contd.)Evaluating CategorizationExample: AutoYahoo!Example: WebKB (CMU)WebKB ExperimentNB Model ComparisonPowerPoint PresentationSample Learning Curve (Yahoo Science Data)Importance of Conditional IndependenceConditions for Optimality of Naive BayesNaive Bayes is Not So NaiveInterpretability of Naive BayesNaive Bayes DrawbacksFinal example: Text classification vs. information extractionNaive integration of IE & TC‘Change of Address’ emailKushmerick: CoA ResultsResourcesCS276BText Information Retrieval, Mining, and ExploitationLecture 4Text Categorization IIntroduction and Naive BayesJan 21, 2003Is this spam?From: "" <[email protected]>Subject: real estate is the only way... gem oalvgkayAnyone can buy real estate with no money downStop paying rent TODAY !There is no need to spend hundreds or even thousands for similar coursesI am 22 years old and I have already purchased 6 properties using themethods outlined in this truly INCREDIBLE ebook.Change your life NOW !=================================================Click Below to order:http://www.wholesaledaily.com/sales/nmd.htm=================================================Categorization/ClassificationGiven:A description of an instance, xX, where X is the instance language or instance space.Issue: how to represent text documents.A fixed set of categories:C = {c1, c2,…, cn}Determine:The category of x: c(x)C, where c(x) is a categorization function whose domain is X and whose range is C.We want to know how to build categorization functions (“classifiers”).Multimedia GUIGarb.Coll.SemanticsMLPlanningplanningtemporalreasoningplanlanguage...programmingsemanticslanguageproof...learningintelligencealgorithmreinforcementnetwork...garbagecollectionmemoryoptimizationregion...“planning language proof intelligence”TrainingData:TestingData:Classes:(AI)Document Classification(Programming) (HCI)......(Note: in real life there is often a hierarchy, not present in the above problem statement; and you get papers on ML approaches to Garb. Coll.)Text Categorization ExamplesAssign labels to each document or web-page:Labels are most often topics such as Yahoo-categoriese.g., "finance," "sports," "news>world>asia>business"Labels may be genrese.g., "editorials" "movie-reviews" "news“Labels may be opinione.g., “like”, “hate”, “neutral”Labels may be domain-specific binarye.g., "interesting-to-me" : "not-interesting-to-me”e.g., “spam” : “not-spam”e.g., “is a toner cartridge ad” :“isn’t”Methods (1)Manual classificationUsed by Yahoo!, Looksmart, about.com, ODP, Medlinevery accurate when job is done by expertsconsistent when the problem size and team is smalldifficult and expensive to scaleAutomatic document classificationHand-coded rule-based systemsUsed by CS dept’s spam filter, Reuters, CIA, Verity, …E.g., assign category if document contains a given boolean combination of wordsCommercial systems have complex query languages (everything in IR query languages + accumulators)Methods (2)Accuracy is often very high if a query has been carefully refined over time by a subject expertBuilding and maintaining these queries is expensiveSupervised learning of document-label assignment functionMany new systems rely on machine learning (Autonomy, Kana, MSN, Verity, …)k-Nearest Neighbors (simple, powerful)Naive Bayes (simple, common method)Support-vector machines (new, more powerful)… plus many other methodsNo free lunch: requires hand-classified training dataBut can be built (and refined) by amateursText Categorization: attributesRepresentations of text are very high dimensional (one feature for each word).High-bias algorithms that prevent overfitting in high-dimensional space are best.For most text categorization tasks, there are many irrelevant and many relevant features.Methods that combine evidence from many or all features (e.g. naive Bayes, kNN, neural-nets) tend to work better than ones that try to isolate just a few relevant features (standard decision-tree or rule induction)**Although one can compensate by using many rulesBayesian MethodsOur focus todayLearning and classification methods based on probability theory.Bayes theorem plays a critical role in probabilistic learning and classification.Build a generative model that approximates how data is producedUses prior probability of each category given no information about an item.Categorization produces a posterior probability distribution over the possible categories given a description of an item.Bayes’ Rule)()|()()|(),( CPCXPXPXCPXCP )()()|()|(XPCPCXPXCP Maximum a posteriori Hypothesis)|(argmax DhPhHhMAP)()()|(argmaxDPhPhDPhHhMAP)()|(argmax hPhDPhHhMAPMaximum likelihood HypothesisIf all hypotheses are a priori equally likely, we only need to consider the P(D|h) term:)|(argmax hDPhHhMLNaive Bayes ClassifiersTask: Classify a new instance based on a tuple of attribute valuesnxxx ,,,21),,,|(argmax21 njCcMAPxxxcPcj),,,()()|,,,(argmax2121njjnCcMAPcccPcPcxxxPcj)()|,,,(argmax21 jjnCcMAPcPcxxxPcjNaïve Bayes Classifier: AssumptionsP(cj)Can be estimated from the frequency of classes in the training examples.P(x1,x2,…,xn|cj) O(|X|n•|C|)Could only be estimated if a very, very large number of training examples was available.Conditional Independence Assumption: Assume that the probability of observing the conjunction of attributes is equal to the product of the individual probabilities.FluX1X2X5X3X4feversinus coughrunnynose muscle-acheThe Naïve Bayes

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford CS 276B - Lecture 4

Sign up for free to view:

Please select your school