Stanford CS 276 - Scoring and results assembly - D2761212

Home> Schools> Stanford University> Computer Science (CS) > CS 276> Scoring and results assembly

DOC PREVIEW

Stanford CS 276 - Scoring and results assembly

School name Stanford University

Course Cs 276- Information Retrieval and Web Search

Pages 48

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Introduc)on*to*Informa)on*Retrieval* !! !!Introduc*on!to!Informa(on)Retrieval)CS276!Informa*on!Retrieval!and!Web!Search!Pandu!Nayak!and!Prabhakar!Raghavan!Lecture!7:!Scoring!and!results!assembly!Introduc)on*to*Informa)on*Retrieval* !! !!Lecture!6!–!I!introduced!a!bug!§ In!my!anxiety!to!avoid!taking!the!log!of!zero,!I!rewrote!as!!!!2!⎩⎨⎧>+=otherwise 0,0 tfif, tflog 1 10 t,dt,dt,dw⎩⎨⎧>+=otherwise 0,0 tfif),tf(1 log 10 t,dt,dt,dwIn fact this was unnecessary, since the zero case is treated specially above; net the FIRST version above is right.Introduc)on*to*Informa)on*Retrieval* !! !!Recap:!IJidf!weigh*ng!§ The!IJidf!weight!of!a!term!is!the!product!of!its!I!weight!and!its!idf!weight.!!§ Best!known!weigh*ng!scheme!in!informa*on!retrieval!§ Increases!with!the!number!of!occurrences!with in !a!document!§ Increases!with!the!rarity!of!the!term!in!the!collec*on!)df/(log)tflog1(w10,10,tdtNdt×+=Ch. 6Introduc)on*to*Informa)on*Retrieval* !! !!Recap:!Queries!as!vectors!§ Key!idea!1:!Do!the!same!fo r!queries:!represent!them!as!vectors!in!the!space!§ Key!idea!2:!Rank!documents!according!to!their!proximity!to!the!query!in!this!sp ace!§ proximity!=!similarity!of!vectors!Ch. 6Introduc)on*to*Informa)on*Retrieval* !! !!Recap:!cosine(query,document)!∑∑∑====•=•=ViiViiViiidqdqddqqdqdqdq12121),cos(Dot product Unit vectors cos(q,d) is the cosine similarity of q and d … or, equivalently, the cosine of the angle between q and d. Ch. 6Introduc)on*to*Informa)on*Retrieval* !! !!This!lecture!§ Speeding!up!vector!space!ranking!§ PuVng!together!a!complete!search !system!§ Will!require!learning!about!!a!number!of!miscellaneous!topics!and!heuris*cs!Ch. 7Introduc)on*to*Informa)on*Retrieval* !! !!Compu*ng!cosine!scores!Sec. 6.3.3Introduc)on*to*Informa)on*Retrieval* !! !!Eﬃcient!cosine!ranking!§ Find!the!K!docs!in!the!collec*on!“nearest”! to!the!query!⇒!K*largest!queryJdoc!cosines.!§ Eﬃcient!ranking:!§ Compu*ng!a!single!cosine!eﬃciently.!§ Choosing!the!K*largest!cosine!values!eﬃciently.!§ Can!we!do!this!without!compu*ng!all!N!cosines?*Sec. 7.1Introduc)on*to*Informa)on*Retrieval* !! !!Eﬃcient!cosine!ranking!§ What!we’re!doing!in!eﬀect:!solving!the!KJnearest!neighbor!problem!for!a!query!vector!§ In!general,!we!do!not!know!how!to!do!this!!eﬃcientl y!for!highJdimensional!spaces!§ But!it!is!solvable!for!short!queries,!and!standard!indexes!support!this!well!Sec. 7.1Introduc)on*to*Informa)on*Retrieval* !! !!Special!case!–!unweighted!q ueries!§ No!weigh*ng!on!query!terms!§ Assume!each!query!term!occurs!on ly!on ce!§ Then!for!ranking,!donʼt!need!to!normalize!query!vector!§ Slight!simpliﬁca*on!of!algorithm!from!Lecture!6!Sec. 7.1Introduc)on*to*Informa)on*Retrieval* !! !!Compu*ng!the!K!largest!cosines:!selec*on!vs.!sor*ng!§ Typically!we!want!to!retrieve!the!top!K!docs!(in!the!cosine!ranking!for!the!query)!§ not!to!totally!order!all!docs!in!the!collec*on!§ Can!we!pick!oﬀ!docs!with!K!highest!cosines?!§ Let!J!=!number!of!docs!with!nonzero!cosines!§ We!seek!the!K!best!of!these!J!Sec. 7.1Introduc)on*to*Informa)on*Retrieval* !! !!Use!heap!for!selec*ng!top!K!§ Binary!tree!in!which!each!node’s!value!>!the!values!of!children!§ Takes!2J!opera*ons!to!construct,!then!each!of!K*“winners”!read!oﬀ!in!2log!J!steps.!§ For!J=1M,!K=100,!this!is!about!10%!of!the!cost!of!sor*ng.!1 .9 .3 .8 .3 .1 .1 Sec. 7.1Introduc)on*to*Informa)on*Retrieval* !! !!Boflenecks!§ Primary!computa*onal!bofleneck!in!scoring:!cosine!computa*on!§ Can!we!avoid!all!this!computa*on?!§ Yes,!but!may!some*mes!get!it!wrong!§ a!doc!not!in!the!top!K!may!creep!into!the!li st!of!K!output!docs!§ Is!this!such!a!bad!thing?!Sec. 7.1.1Introduc)on*to*Informa)on*Retrieval* !! !!Cosine!similarity!is!only!a!proxy!§ User!has!a!task!and!a!query!formula*on!§ Cosine!matches!docs!to!query!§ Thus!cosine!is!anyway!a!proxy!for!user!happiness!§ If!we!get!a!list!of!K!docs!“close”!to!the!top!K!by!cosine!measure,!should!be!ok!Sec. 7.1.1Introduc)on*to*Informa)on*Retrieval* !! !!Generic!approach!§ Find!a!set!A*!of!contenders,!with!K*<*|A|*<<*N*§ A*does!not!necessarily!contain!the!top!K,*but!has!many!docs!from!among!the!top!K*§ Return!the!top!K*docs!in!A*§ Think!of!A!as!pruning!nonJcontenders!§ The!same!approach!is!also! used!for!other!(nonJcosine)!scoring!func*ons!§ Will!look!at!several!schemes!following!this!approach!Sec. 7.1.1Introduc)on*to*Informa)on*Retrieval* !! !!Index!elimina*on!§ Basic!algorithm!cosi ne!computa*on!algorithm!on ly!considers!docs! containing!at!least!one!query!term!§ Take!this!further:!§ Only!consider!highJidf!query!terms!§ Only!consider!docs!containing!many!query!terms!Sec. 7.1.2Introduc)on*to*Informa)on*Retrieval* !! !!HighJidf!quer y!terms!only!§ For!a!query!such!as!catcher*in*the*rye*§ Only!accumulate!scores!from!catcher*and!rye*§ Intui*on:!in*and!the*contribute!lifle!to!the!scores!and!so!donʼt!alter!rankJordering!much!§ Beneﬁt:!§ Pos*ngs!of!lowJid f!terms!have!many!docs!→!these!(many)!docs!get!eliminated!from!set!A*of!contenders*Sec. 7.1.2Introduc)on*to*Informa)on*Retrieval* !! !!Docs!containing!many!quer y!terms!§ Any!doc!with!at!least!one!query!term!is!a!candidate!for!the!top!K!output!list!§ For!mul*Jterm!queries,!only!compute!scores!for!docs !containing!several!of!the!query!terms!§ Say,!at!least!3!out!o f!4!§ Imposes!a!“som!conjunc*on”!on!queries!seen!on!web!search!engines!(early!Google)!§ Easy!to!implement!in!pos*ngs!traversal!Sec. 7.1.2Introduc)on*to*Informa)on*Retrieval* !! !!3!of!4!query!terms!Brutus Caesar Calpurnia 1 2 3 5 8 13 21 34 2 4 8 16 32 64 128 13 16 Antony 3 4 8 16 32 64 128 32 Scores only computed for docs 8, 16 and 32. Sec. 7.1.2Introduc)on*to*Informa)on*Retrieval* !! !!Champion!lists!§ Precompute!for!each!dic*onary!term!t,!the!r!docs!of!highest!weight!in!tʼs!pos*ngs!§ Call!this!the!champion!list!for!t!§ (aka!fancy!list!or!top!docs!for!t)!§ Note!that!r!has!to!be!chosen!at!index!build!*me!§ Thus,!itʼs!possible!that!r!<!K*§

View Full Document