INFM 700: Session 12 Summative EvaluationsTypes of EvaluationsToday’s TopicsEvaluation as a ScienceThe Information Retrieval CycleQuestions About SystemsQuestions That Involve UsersThe Importance of EvaluationTypes of Evaluation StrategiesSystem-Centered EvaluationsUser-Centered EvaluationsEvaluation CriteriaGood Effectiveness MeasuresSet-Based MeasuresAnother ViewPrecision and RecallAutomatic Evaluation ModelTest CollectionsCritiqueUser-Center EvaluationsControlled User StudiesAdditional Effects to ConsiderSlide 23Koenemann and Belkin (1996)What’s the best interface?Query InterfacePenetrable InterfaceStudy DetailsSample TopicProcedureResults: PrecisionRelevance feedback works!Results: Number of IterationsResults: User BehaviorLin et al. (2003)How Much Context?Interface ConditionsUser studyQuestion ScenariosSetupResults: Completion TimeResults: Questions PosedA Story of Goldilocks…Lessons LearnedSystem vs. User EvaluationsBlair and Maron (1985)Blair and Maron’s ResultsTurpin and Hersh (2001)Study Design and ResultsAnalysisSlide 51INFM 700: Session 12Summative EvaluationsJimmy LinThe iSchoolUniversity of MarylandMonday, April 21, 2008This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for detailsiSchoolTypes of EvaluationsFormative evaluationsFiguring out what to buildDetermining what the right questions areSummative evaluationsFinding out if it “works”Answering those questionsiSchoolToday’s TopicsEvaluation basicsSystem-centered evaluationsUser-centered evaluationsCase StudiesTales of cautionEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolEvaluation as a ScienceFormulate a question: the hypothesisDesign an experiment to answer the questionPerform the experimentCompare with a baseline “control”Does the experiment answer the question?Are the results significant? Or is it just luck?Report the results!Rinse, repeat…EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolThe Information Retrieval CycleSourceSelectionSearchQuerySelectionResultsExaminationDocumentsDeliveryInformationQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolQuestions About SystemsExample “questions”:Does morphological analysis improve retrieval performance?Does expanding the query with synonyms improve retrieval performance?Corresponding experiments:Build a “stemmed” index and compare against “unstemmed” baselineExpand queries with synonyms and compare against baseline unexpanded queriesEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolQuestions That Involve UsersExample “questions”:Does keyword highlighting help users evaluate document relevance?Is letting users weight search terms a good idea?Corresponding experiments:Build two different interfaces, one with keyword highlighting, one without; run a user studyBuild two different interfaces, one with term weighting functionality, and one without; run a user studyEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolThe Importance of EvaluationProgress is driven by the ability to measure differences between systemsHow well do our systems work?Is A better than B?Is it really?Under what conditions?Desiderata for evaluationsInsightfulAffordableRepeatableExplainableEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolTypes of Evaluation StrategiesSystem-centered studiesGiven documents, queries, and relevance judgmentsTry several variations of the systemMeasure which system returns the “best” hit listUser-centered studiesGiven several users and at least two systemsHave each user try the same task on both systemsMeasure which system works the “best”EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolSystem-Centered EvaluationsSearchQueryResultsEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolUser-Centered EvaluationsSearchQuerySelectionResultsExaminationDocumentsInformationQueryFormulationSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolEvaluation CriteriaEffectivenessHow “good” are the documents that are gathered?How long did it take to gather those documents?Can consider system only or human + systemUsabilityLearnability, satisfaction, frustrationEffects of novice vs. expert usersEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolGood Effectiveness MeasuresShould capture some aspect of what users wantShould have predictive value for other situationsShould be easily replicated by other researchersShould be easily comparableEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolSet-Based MeasuresPrecision = A ÷ (A+B)Recall = A ÷ (A+C)Miss = C ÷ (A+C)False alarm (fallout) = B ÷ (B+D)Relevant Not relevantRetrieved A BNot retrieved C DCollection size = A+B+C+DRelevant = A+CRetrieved = A+BWhen is precision important?When is recall important?EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolAnother ViewRelevant RetrievedRelevant +RetrievedNot Relevant + Not RetrievedSpace of all documentsEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolPrecision and RecallPrecisionHow much of what was found is relevant?Important for Web search and other interactive situationsRecallHow much of what is relevant was found?Particularly important for law, patent, and medicineHow are precision and recall related?EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolAutomatic Evaluation ModelFocus on systems, hence system-centered (also called “batch”
View Full Document