Unformatted text preview:

INFM 700: Session 12 Summative EvaluationsTypes of EvaluationsToday’s TopicsEvaluation as a ScienceThe Information Retrieval CycleQuestions About SystemsQuestions That Involve UsersThe Importance of EvaluationTypes of Evaluation StrategiesSystem-Centered EvaluationsUser-Centered EvaluationsEvaluation CriteriaGood Effectiveness MeasuresSet-Based MeasuresAnother ViewPrecision and RecallAutomatic Evaluation ModelTest CollectionsCritiqueUser-Center EvaluationsControlled User StudiesAdditional Effects to ConsiderSlide 23Koenemann and Belkin (1996)What’s the best interface?Query InterfacePenetrable InterfaceStudy DetailsSample TopicProcedureResults: PrecisionRelevance feedback works!Results: Number of IterationsResults: User BehaviorLin et al. (2003)How Much Context?Interface ConditionsUser studyQuestion ScenariosSetupResults: Completion TimeResults: Questions PosedA Story of Goldilocks…Lessons LearnedSystem vs. User EvaluationsBlair and Maron (1985)Blair and Maron’s ResultsTurpin and Hersh (2001)Study Design and ResultsAnalysisSlide 51INFM 700: Session 12Summative EvaluationsJimmy LinThe iSchoolUniversity of MarylandMonday, April 21, 2008This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for detailsiSchoolTypes of EvaluationsFormative evaluationsFiguring out what to buildDetermining what the right questions areSummative evaluationsFinding out if it “works”Answering those questionsiSchoolToday’s TopicsEvaluation basicsSystem-centered evaluationsUser-centered evaluationsCase StudiesTales of cautionEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolEvaluation as a ScienceFormulate a question: the hypothesisDesign an experiment to answer the questionPerform the experimentCompare with a baseline “control”Does the experiment answer the question?Are the results significant? Or is it just luck?Report the results!Rinse, repeat…EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolThe Information Retrieval CycleSourceSelectionSearchQuerySelectionResultsExaminationDocumentsDeliveryInformationQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolQuestions About SystemsExample “questions”:Does morphological analysis improve retrieval performance?Does expanding the query with synonyms improve retrieval performance?Corresponding experiments:Build a “stemmed” index and compare against “unstemmed” baselineExpand queries with synonyms and compare against baseline unexpanded queriesEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolQuestions That Involve UsersExample “questions”:Does keyword highlighting help users evaluate document relevance?Is letting users weight search terms a good idea?Corresponding experiments:Build two different interfaces, one with keyword highlighting, one without; run a user studyBuild two different interfaces, one with term weighting functionality, and one without; run a user studyEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolThe Importance of EvaluationProgress is driven by the ability to measure differences between systemsHow well do our systems work?Is A better than B?Is it really?Under what conditions?Desiderata for evaluationsInsightfulAffordableRepeatableExplainableEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolTypes of Evaluation StrategiesSystem-centered studiesGiven documents, queries, and relevance judgmentsTry several variations of the systemMeasure which system returns the “best” hit listUser-centered studiesGiven several users and at least two systemsHave each user try the same task on both systemsMeasure which system works the “best”EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolSystem-Centered EvaluationsSearchQueryResultsEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolUser-Centered EvaluationsSearchQuerySelectionResultsExaminationDocumentsInformationQueryFormulationSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolEvaluation CriteriaEffectivenessHow “good” are the documents that are gathered?How long did it take to gather those documents?Can consider system only or human + systemUsabilityLearnability, satisfaction, frustrationEffects of novice vs. expert usersEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolGood Effectiveness MeasuresShould capture some aspect of what users wantShould have predictive value for other situationsShould be easily replicated by other researchersShould be easily comparableEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolSet-Based MeasuresPrecision = A ÷ (A+B)Recall = A ÷ (A+C)Miss = C ÷ (A+C)False alarm (fallout) = B ÷ (B+D)Relevant Not relevantRetrieved A BNot retrieved C DCollection size = A+B+C+DRelevant = A+CRetrieved = A+BWhen is precision important?When is recall important?EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolAnother ViewRelevant RetrievedRelevant +RetrievedNot Relevant + Not RetrievedSpace of all documentsEvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolPrecision and RecallPrecisionHow much of what was found is relevant?Important for Web search and other interactive situationsRecallHow much of what is relevant was found?Particularly important for law, patent, and medicineHow are precision and recall related?EvaluationBasicsSystem-centeredEvaluationsUser-centeredEvaluationsCase StudiesTales of CautioniSchoolAutomatic Evaluation ModelFocus on systems, hence system-centered (also called “batch”


View Full Document

UMD INFM 700 - Summative Evaluations

Download Summative Evaluations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Summative Evaluations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Summative Evaluations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?