Unformatted text preview:

CSCI 5417 Information Retrieval Systems Jim MartinTodayNormal Cosine ScoringSpeedups...Generic Approach to Reducing CosinesImpact-Ordered Postings1. Early Termination2. IDF-ordered termsEvaluationEvaluation Metrics for Search EnginesSlide 11Measuring user happinessSlide 13Happiness: Difficult to MeasureEvaluating an IR systemStandard Relevance BenchmarksUnranked Retrieval EvaluationAccuracy/Error RateUnranked Retrieval Evaluation: Precision and RecallPrecision/RecallDifficulties in Using Precision/RecallEvaluating Ranked ResultsRecall/PrecisionSlide 24Slide 25A Precision-Recall curveAveraging over queriesInterpolated precisionInterpolated ValuesAn Interpolated Precision-Recall CurveTypical (good) 11 point precisionsBreakSlide 33Yet more evaluation measures…VarianceFinallyFrom corpora to test collectionsPoolingTRECCritique of Pure RelevanceSearch Engines…Evaluation at large search enginesA/B testingNext TimeCSCI 5417Information Retrieval SystemsJim MartinLecture 79/13/201101/14/19 CSCI 5417 2TodayReviewEfficient scoring schemesApproximate scoringEvaluating IR systems01/14/19 CSCI 5417 3Normal Cosine Scoring01/14/19 CSCI 5417 4Speedups...Compute the cosines fasterDon’t compute as many cosines01/14/19 CSCI 5417 5Generic Approach to Reducing CosinesFind a set A of contenders, with K < |A| << NA does not necessarily contain the top K, but has many docs from among the top KReturn the top K docs in AThink of A as pruning likely non-contenders01/14/19 CSCI 5417 6Impact-Ordered PostingsWe really only want to compute scores for docs for which wft,d is high enoughLow scores are unlikely to change the ordering or reach the top KSo sort each postings list by wft,dHow do we compute scores in order to pick off top K?Two ideas follow01/14/19 CSCI 5417 71. Early TerminationWhen traversing t’s postings, stop early after eitherAfter a fixed number of docs orwft,d drops below some thresholdTake the union of the resulting sets of docsfrom the postings of each query termCompute only the scores for docs in this union01/14/19 CSCI 5417 82. IDF-ordered termsWhen considering the postings of query termsLook at them in order of decreasing IDFHigh IDF terms likely to contribute most to scoreAs we update score contribution from each query termStop if doc scores relatively unchangedEvaluation01/14/19 CSCI 5417 901/14/19 CSCI 5417 10Evaluation Metrics for Search EnginesHow fast does it index?Number of documents/hourRealtime searchHow fast does it search?Latency as a function of index sizeExpressiveness of query languageAbility to express complex information needsSpeed on complex queries01/14/19 CSCI 5417 11Evaluation Metrics for Search EnginesAll of the preceding criteria are measurable: we can quantify speed/size; we can make expressiveness preciseBut the key really is user happinessSpeed of response/size of index are factorsBut blindingly fast, useless answers won’t make a user happyWhat makes people come back?Need a way of quantifying user happiness01/14/19 CSCI 5417 12Measuring user happinessIssue: Who is the user we are trying to make happy?Web engine: user finds what they want and returns often to the engineCan measure rate of return userseCommerce site: user finds what they want and makes a purchaseMeasure time to purchase, or fraction of searchers who become buyers?01/14/19 CSCI 5417 13Measuring user happinessEnterprise (company/govt/academic): Care about “user productivity”How much time do my users save when looking for information?Many other criteria having to do with breadth of access, secure access, etc.01/14/19 CSCI 5417 14Happiness: Difficult to MeasureMost common proxy for user happiness is relevance of search resultsBut how do you measure relevance?We will detail one methodology here, then examine its issuesRelevance measurement requires 3 elements:1. A benchmark document collection2. A benchmark suite of queries3. A binary assessment of either Relevant or Not relevant for query-doc pairsSome work on more-than-binary, but not typical01/14/19 CSCI 5417 15Evaluating an IR systemThe information need is translated into a queryRelevance is assessed relative to the information need not the queryE.g., Information need: I'm looking for information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.Query: wine red white heart attack effectiveYou evaluate whether the doc addresses the information need, not whether it has those words01/14/19 CSCI 5417 16Standard Relevance BenchmarksTREC - National Institute of Standards and Testing (NIST) has run a large IR test-bed for many yearsReuters and other benchmark doc collections used“Retrieval tasks” specifiedsometimes as queriesHuman experts mark, for each query and for each doc, Relevant or IrrelevantFor at least for subset of docs that some system returned for that query01/14/19 CSCI 5417 17Unranked Retrieval EvaluationAs with any such classification task there are 4 possible system outcomes: a, b, c and da and d represent correct responses. c and b are mistakes.False pos/False negType 1/Type 2 errorsRelevant Not RelevantRetrieved a bNot Retrievedc d01/14/19 CSCI 5417 18Accuracy/Error RateGiven a query, an engine classifies each doc as “Relevant” or “Irrelevant”.Accuracy of an engine: the fraction of these classifications that is correct.a+d/a+b+c+dThe number of correct judgments out of all the judgments made.Why is accuracy useless for evaluating large search engings?01/14/19 CSCI 5417 19Unranked Retrieval Evaluation:Precision and RecallPrecision: fraction of retrieved docs that are relevant = P(relevant|retrieved)Recall: fraction of relevant docs that are retrieved = P(retrieved|relevant)Precision P = a/(a+b)Recall R = a/(a+c)Relevant Not RelevantRetrieved a bNot Retrieved c d01/14/19 CSCI 5417 20Precision/RecallYou can get high recall (but low precision) by retrieving all docs for all queries!Recall is a non-decreasing function of the number of docs retrievedThat is, recall either stays the same or increases as you return more docsIn a most systems, precision decreases with the number of docs retrieved Or as recall increasesA fact with strong empirical confirmation01/14/19 CSCI 5417 21Difficulties in Using


View Full Document

CU-Boulder CSCI 5417 - Lecture 7

Download Lecture 7
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 7 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?