Princeton COS 435 - Information Retrieval, Discovery, & Delivery

Unformatted text preview:

11COS 435: InformationRetrieval, Discovery, & DeliveryQuestions about how we find, organize,evaluate and deliver information2Historic Goals“ to organize the world's information andmake it universally accessible and useful”“ an individual stores all his books, records,and communications, and which is mechanizedso that it may be consulted with exceedingspeed and flexibility. It is an enlarged intimatesupplement to his memory.”3Historic Goals“Google's mission is to organize the world's information andmake it universally accessible and useful” Google’smission statement, ~ 1998.“A memex is a device in which an an individual stores allhis books, records, and communications, and which ismechanized so that it may be consulted with exceedingspeed and flexibility. It is an enlarged intimatesupplement to his memory.” Vannervar Bush, As wemay think, Atlantic Monthly, July 1945.4Concepts• Data ?• Information ?• Content?• Knowledge ?5One definition• Data: 0’s and 1’s stored, with or withoutstructure• Information: Data with semantic interpretation• Content: all information in a document orcollection• Knowledge: a functional understanding ofinformationThese definitions basically match class discussion;content and knowledge can be used bothnarrowly and broadly and we had definitionsmatching each6Data help us?• Structured data : data baseTagged, typed• Semi-structured data: tagged – XMLHTML?• Unstructured:– Text– Graphics: 2D, 3D– Music– Video27What do you want?• Know it there – Data Bases - data retrieval• Know it when see it – Information Retrieval• Surprise me – Data Mining (COS 424)8Information Retrieval vs Searchdiscovery of content+retrieval of content relevant to query= searchSEARCH ENGINES9Delivery of content• in digital libraries, search tool and contentrepository over one umbrella organization: e.g.Library of Congress• on Web, actual Web pages not provided bysearch engines (although can get cached copysometimes)– Where Web pages stored affects delivery10What do you want, Part 2• information need v.s. query form– User has information need– Retrieval system has query form• Does query capture information need?• Relevance– A judgment by user– Compare: no sense of relevance in dataretrieval11How do you do it?• Model– Contents– Query– Matching of contents to query - results• Algorithms– Effectiveness– Efficiency12What are performance issues?• Effectiveness: does search return relevantresults ?• Large amounts data – disks I/O! or not?• NetworkingWhere is data?Should data be somewhere else?• WebHow find information?How use Web structure?313Information DeliveryBroadly construed can mean:• User Interfaces• Protocols• Storage Management• Bandwidth managementBig question: what is model of interaction?compare handheld wireless, CS Dept machine14Information Delivery cont.Focus on latter two:• Storage management– Distributed storage– Permanence• Bandwidth management– Caching– Prefetching– Content distribution networks15Topics 1• query models for searching (keyword-based)• models of documents• Indexing and inverted files• Ranking documents• Using linking structure for Web content analysis• Semantic and feedback techniques• User behavior-based relevance criteria; privacyissues• Manipulating search engine results (SEOs)• Evaluating retrieval systems16Topics 2• Web crawling• Document similarity• Clustering• Non-text media search: e.g. music,images• adding structure to information:databases, XML, the semantic Web17Topics 3• system design of search engines:distributed storage and computing• Information caching• Content distribution networks• Reliability and permanence of information18Course logistics• Texts– For IR will assign reading from new online textIntroduction to Information Retrieval• Test – two, expect not in class.• Homework, approx. every couple of week• Presentation – one short• Project – your choosing with


View Full Document

Princeton COS 435 - Information Retrieval, Discovery, & Delivery

Download Information Retrieval, Discovery, & Delivery
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information Retrieval, Discovery, & Delivery and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information Retrieval, Discovery, & Delivery 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?