Spoken Dialogue Systems Julia Hirschberg CS 4706 01 14 19 1 Issues Error avoidance Error detection From the system side how likely is it the system made an error From the user side what cues does the user provide to indicate an error Error handling what can the system do when it thinks an error has occurred Evaluation how do you know what needs fixing most 01 14 19 2 Avoiding misunderstandings By imitating human performance Timing and grounding Clark 03 01 14 19 3 Recognizing Problematic Dialogues Hastie et al What s the Trouble ACL 2002 01 14 19 4 Recognizing Problematic Utterances Hirschberg et al 99 Collect corpus from interactive voice response system Identify speaker turns incorrectly recognized where speakers first aware of error that correct misrecognitions Identify prosodic features of turns in each category and compare to other turns Use Machine Learning techniques to train a classifier to make these distinctions automatically 01 14 19 5 Turn Types TOOT Hi This is AT T Amtrak Schedule System This is TOOT How may I help you User Hello I would like trains from Philadelphia to New York leaving on misrecognition Sunday at ten thirty in the evening TOOT Which city docorrection you want to go to User New York 01 14 19 aware site 6 Results Reduced error in predicting misrecognized turns to 8 64 Error in predicting awares 12 Error in predicting corrections 18 21 01 14 19 7 Evidence from Human Performance Users provide explicit positive and negative feedback Corpus based vs laboratory experiments do these tell us different things Bell Gustafson 00 What do we learn from this What functions does feedback serve Krahmer et al go on and go back signals in grounding situations implicit explicit verification 01 14 19 8 Pos short turns unmarked word order confirmation answers no corrections or repetitions new info Neg long turns marked word order disconfirmation no answer corrections repetitions no new info Hypotheses supported but Can these cues be identified automatically How might they affect the design of SDS 01 14 19 9 Error Handling Strategies Goldberg et al 03 how should systems best inform the user that they don t understand System rephrasing vs repetitions vs statement of not understanding Apologies What behaviors might these produce Hyperarticulation User frustration User repetition or rephrasing 01 14 19 10 What lessons do we learn What produces least frustration Best recognized input 01 14 19 11 Evaluating Dialogue Systems PARADISE framework Walker et al 00 Performance of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished Maximize Task Success 01 14 19 Minimize Costs Efficiency Qualitative Measures Measures 12 Task Success Task goals seen as Attribute Value Matrix ELVIS e mail retrieval task Walker et al 97 Find the time and place of your meeting with Kim Attribute Selection Criterion Time Place Value Kim or Meeting 10 30 a m 2D516 Task success defined by match between AVM values at end of with true values for AVM 01 14 19 13 Metrics Efficiency of the Interaction User Turns System Turns Elapsed Time Quality of the Interaction ASR rejections Time Out Prompts Help Requests Barge Ins Mean Recognition Score concept accuracy Cancellation Requests User Satisfaction Task Success perceived completion information extracted 01 14 19 14 Experimental Procedures Subjects given specified tasks Spoken dialogues recorded Cost factors states dialog acts automatically logged ASR accuracy barge in hand labeled Users specify task solution via web page Users complete User Satisfaction surveys Use multiple linear regression to model User Satisfaction as a function of Task Success and Costs test for significant predictive factors 01 14 19 15 User Satisfaction Sum of Many Measures Was Annie easy to understand in this conversation TTS Performance In this conversation did Annie understand what you said ASR Performance In this conversation was it easy to find the message you wanted Task Ease Was the pace of interaction with Annie appropriate in this conversation Interaction Pace In this conversation did you know what you could say at each point of the dialog 01 14 19 User Expertise How often was Annie sluggish and slow to reply to you in this conversation System Response Did Annie work the way you expected her to in this conversation Expected Behavior From your current experience with using Annie to get your email do you think you d use Annie regularly to access your mail when you are away from your desk Future Use 16 Performance Functions from Three Systems ELVIS User Sat 21 COMP 47 MRS 15 ET TOOT User Sat 35 COMP 45 MRS 14 ET ANNIE User Sat 33 COMP 25 MRS 33 Help COMP User perception of task completion task success MRS Mean recognition accuracy cost ET Elapsed time cost Help Help requests cost 01 14 19 17 Performance Model Perceived task completion and mean recognition score are consistently significant predictors of User Satisfaction Performance model useful for system development Making predictions about system modifications Distinguishing good dialogues from bad dialogues But can we also tell on line when a dialogue is going wrong 01 14 19 18 Next Week Speech summarization and data mining 01 14 19 19
View Full Document
Unlocking...