CU-Boulder CSCI 5417 - Lecture 11 (37 pages)

Previewing pages 1, 2, 17, 18, 19, 36, 37 of 37 page document View the full content.
View Full Document

Lecture 11



Previewing pages 1, 2, 17, 18, 19, 36, 37 of actual document.

View the full content.
View Full Document
View Full Document

Lecture 11

76 views


Pages:
37
School:
University of Colorado at Boulder
Course:
Csci 5417 - Information Retrieval Systems
Information Retrieval Systems Documents

Unformatted text preview:

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 11 9 29 2011 Today 9 29 Classification Na ve Bayes classification 01 13 19 Unigram LM CSCI 5417 IR 2 Where we are Basics of ad hoc retrieval Indexing Term weighting scoring Cosine Evaluation Document classification Clustering Information extraction Sentiment Opinion mining 01 13 19 CSCI 5417 IR 3 Is this spam From takworlld hotmail com Subject real estate is the only way gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY There is no need to spend hundreds or even thousands for similar courses I am 22 years old and I have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook Change your life NOW Click Below to order http www wholesaledaily com sales nmd htm 01 13 19 CSCI 5417 IR 4 Text Categorization Examples Assign labels to each document or web page Labels are most often topics such as Yahoo categories finance sports news world asia business Labels may be genres editorials movie reviews news Labels may be opinion like hate neutral Labels may be domain specific interesting to me not interesting to me spam not spam contains adult content doesn t important to read now not important 01 13 19 CSCI 5417 IR 5 Categorization Classification Given A description of an instance x X where X is the instance language or instance space Issue for us is how to represent text documents And a fixed set of categories C c1 c2 cn Determine The category of x c x C where c x is a categorization function whose domain is X and whose range is C 01 13 19 We want to know how to build categorization functions i e classifiers CSCI 5417 IR 6 Text Classification Types Those examples can be further classified by type Binary Multiway Business vs sports vs gossip Hierarchical Spam not spam contains adult content doesn t News UK Wales Weather Mixture model 01 13 19 8 basketball 2 business CSCI 5417 IR 7 Document Classification planning language proof intelligence Test Data AI



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Lecture 11 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 11 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?