TAMU CSCE 315 - ir-evaluation - D941514

Home> Schools> Texas A&M University> Computer Sci. & Engr. (CSCE) > CSCE 315> ir-evaluation

TAMU CSCE 315 - ir-evaluation

Pages 10

Download Save

Unformatted text preview:

Retrieval EvaluationIntroductionRetrieval Performance EvaluationPrecision and RecallPrecisionRecallPrecision/Recall Trade-OffPlotting Precision/Recall CurveSlide 9Evaluating Interactive SystemsRetrieval EvaluationIntroductionEvaluation of implementations in computer science often is in terms of time and space complexity.With large document sets, or large content types, such performance evaluations are valid.In information retrieval, we also care about retrieval performance evaluation, that is how well the retrieved documents match the goal.Retrieval Performance EvaluationWe discussed overall system evaluation previously–Traditional vs. berry-picking models of retrieval activity–Metrics include time to complete task, user satisfaction, user errors, time to learn systemBut how can we compare how well different algorithms do at retrieving documents?Precision and RecallConsider if we have a document collection, a query and its results, and a task and its relevant documents.Document CollectionRelevant Documents|R|Retrieved Documents|A|RelevantDocuments inAnswer Set|Ra|PrecisionPrecision – the percentage of retrieved documents that are relevant.= |Ra| / |A|Document CollectionRelevant Documents|R|Retrieved Documents|A|RelevantDocuments inAnswer Set|Ra|RecallRecall – the percentage of relevant documents that are retrieved.= |Ra| / |R|Document CollectionRelevant Documents|R|Retrieved Documents|A|RelevantDocuments inAnswer Set|Ra|Precision/Recall Trade-OffWe can guarantee 100% recall by returning all documents in the collection …–Obviously, this is a bad idea!We can get a high precision rate by only returning documents that we are sure of.–Maybe a bad ideaSo, retrieval algorithms are characterized by their recall and precision curvePlotting Precision/Recall Curve11-Level Precision/Recall Graph–Plot precision at 0%, 10%, 20%, …, 100% recall.–Normally averages over a set of standard queries are used.•Pavg(r) = Σ ( Pi(r) / Nq )Example (using one query):Relevant Documents (Rq) = {d1, d2, d3, d4, d5, d6, d7, d8, d9, d10}Ordered Ranking by Retrieval Algorithm (Aq) = {d10, d27, d7, d44, d35, d3, d73, d82, d19, d4 , d29, d33, d48, d54, d1}Plotting Precision/Recall CurveExample (second query):Relevant Documents (Rq) = {d1, d7, d82}Ordered Ranking by Retrieval Algorithm (Aq) = {d10, d27, d7, d44, d35, d3, d73, d82, d19, d4 , d29, d33, d48, d54, d1}Need to interpolate.Now plot the average of a set of queries that matches expected usage and distributionEvaluating Interactive SystemsEmpirical data involving human users is time consuming to gather and difficult to draw universal conclusions from.Evaluation metrics for user interfaces–Time required to learn the system–Time to achieve goals on benchmark tasks–Error rates–Retention of the use of the interface over time–User

View Full Document


School:
Email:
New Password:
Confirm Password:

TAMU CSCE 315 - ir-evaluation

Sign up for free to view:

Please select your school