Unformatted text preview:

4/15/09'1'Relevance'Feedback'CISC489/689‐010,'Lecture'#15'Monday,'April'13th'Ben'CartereHe'Query'Process'Corpus'Accessible'data'store'Server(s)'Ranking'f(Q,D) EvaluaPon'(Precision,'recall,''clicks,'…)'4/15/09'2'User'InteracPon'• User'inputs'a'query'• Gets'a'ranked'list'of'results'• InteracPon'doesn’t'have'to'end'there!'– A'typical'engine‐user'interacPon:''the'user'looks'at'the'results'and'reformulates'the'query'– What'if'the'engine'could'do'it'automaPcally?'Example'4/15/09'3'InteracPon'Model'• Relevance(feedback'– User'indicates'which'documents'were'relevant,'which'were'nonrelevant'• Possibly'using'check'boxes'or'some'other'buHon'– System'takes'this'feedback'and'uses'it'to'find'other'relevant'documents'– Typical'approach:''query(expansion'– Add'“relevant'terms”'to'the'query'with'weights'Example'Feedback'Interface'Promote'result'Remove'result'Find'similar'pages'4/15/09'4'Models'for'Relevance'Feedback'• Retrieval'models'<‐>'relevance'feedback'models'• A'model'for'relevance'feedback'needs'to'take'marked'relevant'documents'and'use'them'to'update'the'query'or'results'– Google'model'is'very'simple:''move'result'to'top'on'“promote”'click,'move'to'boHom'on'“remove”'click'– Slightly'more'complex'Google'model:''use'one'document'as'a'relevant'document'for'“similar'pages”'click'– Query'expansion'is'a'more'common'approach'Vector'Space'Feedback'• Documents,'queries'are'vectors'• Add'relevant'document'vectors'together'to'obtain'a'“relevant'vector”'• Add'nonrelevant'document'vectors'together'to'obtain'a'“nonrelevant'vector”'• We'want'a'new'query'vector'Q’'that'is'closer'to'the'relevant'vector'than'the'nonrelevant'vector'4/15/09'5'VSM'Feedback'IllustraPon'Q'Relevant'Not'relevant'Q'='t1'Q’'='3t2,'‐3t1'Relevance'Feedback'• Rocchio'algorithm'• Op7mal(query((– Maximizes'the'difference'between'the'average'vector'represenPng'the'relevant'documents'and'the'average'vector'represenPng'the'non‐relevant'documents'• Modifies'query'according'to'– α,'β,'and(γ(are'parameters'• Typical'values'8,'16,'4'4/15/09'6'Rocchio'Feedback'in'PracPce'• Might'add'top'k'terms'only'• Could'ignore'the'nonrelevant'part'– Has'not'consistently'been'shown'to'improve'performance'• Might'choose'to'include'some'documents'but'not'others'– Most'certain,'most'uncertain,'highest'quality,'…'Rocchio'Expanded'Query'Example'• TREC'topic'106:'• Original'query'(automaPcally'generated):'• Expanded'query:'Title:''U.S.'Control'of'Insider'Trading'DescripPon:''Document'will'report'proposed'or'enacted'changes'to'U.S.'laws'and'regulaPons'designed'to'prevent'insider'trading.'#wsum('2.0'#uw50('Control'of'Insider'Trading')'''''''''''''''''2.0'#1('#USA'Control')''''''''''''''''''5.0'#1('Insider'Trading')'''''''''''''''''1.0'proposed'1.0'enacted'1.0'changes'1.0'#1('#USA'laws')'''''''''''''''''1.0'regulaPons'1.0'designed'1.0'prevent')'#wsum('3.88'#uw50('control'inside'trade')'2.21'#1('#USA'control')''''''''''''''''145.57'#1('inside'trade')''''''''''''''''0.54'propose'2.46'enact'0.99'change'4.35'#1('#USA'law')''''''''''''''''10.35'regulate'0.80'design'1.73'prevent''''''''''''''''4.60'drexel'2.05'fine'1.85'subcommiHee'1.69'surveillance'1.60'markey''''''''''''''''1.53'senate'1.19'manipulate'1.10'pass'1.06'scandal'0.92'edward')'4/15/09'7'ProbabilisPc'Feedback'• Recall'probabilisPc'models:'– Relevant'class'versus'nonrelevant'class'• P(R'|'D,'Q)'versus'P(NR'|'D,'Q)'– OpPmal'ranking'is'in'decreasing'order'of'probability'of'relevance'• Basic'probabilisPc'model'assumes'no'knowledge'of'classes'– e.g.'BIM:'IllustraPon'Feedback'provides'informaPon'about'the'classes'User’s'relevant'documents'User’s'nonrelevant'documents'4/15/09'8'ConPngency'Table'Gives'BIM'feedback'scoring'funcPon:'For'term'i:'Number'of'relevant'documents''that'contain'term'i'Number'of'documents''that'contain'term'i'Number'of'relevant''documents''Number'of'documents''BIM'Feedback'• Not'query'expansion'– It'does'not'add'terms'to'the'query'• It'modifies'term'weights'based'on'presence'or'absence'in'relevant'documents'– Terms'that'appear'much'more'open'in'the'relevant'class'than'the'nonrelevant'class'are'good'discriminators'of'relevance'– i.e.'ri'>'ni'–'ri''''good'discriminator'4/15/09'9'Language'Model'Feedback'• Recall'the'query‐likelihood'language'model:'– Where’s'the'relevance?'• A'relevance(model'is'a'language'model'for'the'informaPon'need'– P(t'|'R)'– What'is'the'probability'that'the'author'of'some'relevant'document'would'use'the'term't?'– Or'what'is'the'probability'that'the'user'with'the'informaPon'need'would'describe'it'using't?'P (Q|D)=!t∈QP (t|D)Relevance'Models'• The'query'and'relevant'documents'are'samples'from'the'relevance'model'• P(D|R)'‐'probability'of'generaPng'the'text'in'a'document'given'a'relevance'model'– document(likelihood(model'– less'effecPve'than'query'likelihood'due'to'difficulPes'comparing'across'documents'of'different'lengths'• Original'moPvaPon'was'to'incorporate'relevance'into'language'model'4/15/09'10'EsPmaPng'the'Relevance'Model'• Probability'of'pulling'a'word'w(out'of'the'“bucket”'represenPng'the'relevance'model'depends'on'the'n(query'words'we'have'just'pulled'out'• By'definiPon'EsPmaPng'the'Relevance'Model'• Joint'probability'is'• Assume'• Gives'Look'familiar?'Query‐likelihood'score.''Set'to'0'for'nonrelevant'docs.'4/15/09'11'EsPmaPng'the'Relevance'Model'• P(D)'usually'assumed'to'be'uniform'• P(w,(q1(.(.(.(qn)(is'simply'a'weighted'average'of'the'language'model'probabiliPes'for'w'in'a'set'of'documents,'where'the'weights'are'the'query'likelihood'scores'for'those'documents'• Formal'model'for'relevance'feedback'in'the'language'model'– query'expansion'technique'Relevance'Models'in'PracPce'• In'theory:'– Use'all'the'documents'in'the'collecPon'weighted'by'query‐likelihood'score'or'relevance'– Expand'query'with'every'term'in'the'vocabulary'• In'pracPce:'– Use'only'the'feedback'documents,'or'the'top'k'documents,'or'a'subset'– Expand'query'with'only'n'highest‐probability'terms'4/15/09'12'Example'RMs'from'Top'10'Docs'Example'RMs'from'Top'50'Docs'4/15/09'13'KL‐Divergence'•


View Full Document

UD CISC 689 - Relevance
 Feedback

Download Relevance
 Feedback
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Relevance
 Feedback and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Relevance
 Feedback 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?