Unformatted text preview:

Retrieval EvaluationIntroductionRetrieval Performance EvaluationPrecision and RecallPrecisionRecallPrecision/Recall Trade-OffPlotting Precision/Recall CurveSlide 9Evaluating Interactive SystemsRetrieval EvaluationIntroductionEvaluation of implementations in computer science often is in terms of time and space complexity.With large document sets, or large content types, such performance evaluations are valid.In information retrieval, we also care about retrieval performance evaluation, that is how well the retrieved documents match the goal.Retrieval Performance EvaluationWe discussed overall system evaluation previously–Traditional vs. berry-picking models of retrieval activity–Metrics include time to complete task, user satisfaction, user errors, time to learn systemBut how can we compare how well different algorithms do at retrieving documents?Precision and RecallConsider if we have a document collection, a query and its results, and a task and its relevant documents.Document CollectionRelevant Documents|R|Retrieved Documents|A|RelevantDocuments inAnswer Set|Ra|PrecisionPrecision – the percentage of retrieved documents that are relevant.= |Ra| / |A|Document CollectionRelevant Documents|R|Retrieved Documents|A|RelevantDocuments inAnswer Set|Ra|RecallRecall – the percentage of relevant documents that are retrieved.= |Ra| / |R|Document CollectionRelevant Documents|R|Retrieved Documents|A|RelevantDocuments inAnswer Set|Ra|Precision/Recall Trade-OffWe can guarantee 100% recall by returning all documents in the collection …–Obviously, this is a bad idea!We can get a high precision rate by only returning documents that we are sure of.–Maybe a bad ideaSo, retrieval algorithms are characterized by their recall and precision curvePlotting Precision/Recall Curve11-Level Precision/Recall Graph–Plot precision at 0%, 10%, 20%, …, 100% recall.–Normally averages over a set of standard queries are used.•Pavg(r) = Σ ( Pi(r) / Nq )Example (using one query):Relevant Documents (Rq) = {d1, d2, d3, d4, d5, d6, d7, d8, d9, d10}Ordered Ranking by Retrieval Algorithm (Aq) = {d10, d27, d7, d44, d35, d3, d73, d82, d19, d4 , d29, d33, d48, d54, d1}Plotting Precision/Recall CurveExample (second query):Relevant Documents (Rq) = {d1, d7, d82}Ordered Ranking by Retrieval Algorithm (Aq) = {d10, d27, d7, d44, d35, d3, d73, d82, d19, d4 , d29, d33, d48, d54, d1}Need to interpolate.Now plot the average of a set of queries that matches expected usage and distributionEvaluating Interactive SystemsEmpirical data involving human users is time consuming to gather and difficult to draw universal conclusions from.Evaluation metrics for user interfaces–Time required to learn the system–Time to achieve goals on benchmark tasks–Error rates–Retention of the use of the interface over time–User


View Full Document

TAMU CSCE 315 - ir-evaluation

Download ir-evaluation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ir-evaluation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ir-evaluation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?