Princeton COS 435 - Classic Information Retrieval III (2 pages)

Previewing page 1 of 2 page document View the full content.
View Full Document

Classic Information Retrieval III



Previewing page 1 of actual document.

View the full content.
View Full Document
View Full Document

Classic Information Retrieval III

24 views


Pages:
2
School:
Princeton University
Course:
Cos 435 - Networks, Economics and Computing

Unformatted text preview:

Summary weight calculation Classic Information Retrieval III Vector model and Latent Semantic Indexing General notation wjd is the weight of term j in document d freqjd is the of times term j appears in doc d nj docs containing term j N number of docs in collection Classic tf idf definition of weight wjd freqjd log N n j 1 Vector Model example Weight of query components Doc 1 Computers have brought the world to our fingertips We will try to understand at a basic level the science old and new underlying this new Computational Universe Our quest takes us on a broad sweep of scientific knowledge and related technologies Ultimately this study makes us look anew at ourselves our genome language music knowledge and above all the mystery of our intelligence cos 116 description Frequencies science 1 knowledge 2 principles 0 engineering 0 Set list of terms some choices 1 wjq 0 or 1 2 wjq freqjq log N n j 0 or log N nj Bag of terms 2 Analyze like document Some queries are prose expressions of information need Do we want idf term in both document weights and query weights Doc 2 An introduction to computer science in the context of scientific engineering and commercial applications The goal of the course is to teach basic principles and practical issues while at the same time preparing students to use computers effectively for applications in computer science cos 126 description Frequencies science 2 knowledge 0 principles 1 engineering 1 3 Vector model example cont 4 Term by Doc Table freqjd log N nj Consider the 5 100 level and 200 level COS courses as the collection 109 217 226 Only other appearance of our 4 words is science once in 109 description idf science ln 5 3 51 science Doc 1 Doc 2 51 1 02 engineering 1 6 principles 1 6 engineering principles knowledge ln 5 1 1 6 knowledge 5 3 2 6 1 Unnormalized score for query science engineering knowledge principles using 0 1 query vector Additional ways to calculate weights Dampen frequency effect wjd 1 log freqjd if freqjd 0 0



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Classic Information Retrieval III and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Classic Information Retrieval III and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?