Spoken Dialogue System Architecture Joshua Gordon CS4706 1 Outline Goals of an SDS architecture Research challenges Practical considerations An end to end tour of a real world SDS 2 SDS Architectures Software abstractions that tie together orchestrate the many NLP components required for human computer dialogue Conduct task oriented limited domain conversations Manage the many levels of information processing e g utterance interpretation turn taking necessary for dialogue In real time under uncertainty 3 Examples Information seeking transactional Most common CMU Bus route information Columbia Virtual Librarian Google Directory service Let s Go Public 4 Examples Virtual Humans Multimodal input output Prosody and facial expression Auditory and visual clues assist turn taking Many limitations Scripting Constrained domain http ict usc edu projects virtual humans 5 Examples Interactive Kiosks Multi participant conversations Surprises and challenges passersby to trivia games Bohus and Horvitz 2009 6 Examples Robotic Interfaces www cellbots com 7 Speech interface to a UAV Eliasson 2007 Conversational skills SDS Architectures tie together Speech recognition Turn taking Dialogue management Utterance interpretation Grounding Natural language generation And increasingly include Multimodal input output Gesture recognition 8 Research Challenges in every area Speech recognition Accuracy in interactive settings detecting emotion Turn taking Fluidly handling overlap backchannels Dialogue management Increasingly complex domains better generalization multi party conversations Utterance interpretation Reducing constraints on what the user can 9 say it Attending to say and how they can A tour of a real world SDS CMU Olympus Open source collection of dialogue system components Research platform used to investigate dialogue management turn taking spoken language interpretation Actively developed Many implementations Let s go public Team Talk CheckItOut www speech cs cmu edu 10 Conventional SDS Pipeline Speech signals to words Words to domain concepts Concepts to system intentions Intentions to utterances represented as text Text to speech 11 Olympus under the hood provider pattern 12 Speech recognition 13 The Sphinx Open Source Recognition Toolkit Pocket sphinx Continuous speech speaker independent recognition system Includes tools for language model compilation pronunciation and acoustic model adaptation Provides word level confidence annotation n best lists Efficient runs on embedded devices including an iPhone SDK Olympus supports parallel decoding engines models Typically runs parallel acoustic models for male and female speech http cmusphinx sourceforge net 14 Speech recognition challenge in interactive settings 15 Spontaneous dialogue is difficult for speech recognizers Poor in interactive settings compared to one off applications like voice search and dictation Performance phenomena backchannels pause fillers false starts OOV words Interaction with an SDS is cognitively demanding for users What can I say and when Will the system understand me Uncertainty increases disfluency resulting in further recognition errors 16 WER Word Error Rate Non interactive settings Google Voice Search 17 deployed 0 57 OOV over 10k queries randomly sampled from Sept Dec 2008 Interactive settings Let s Go Public 17 in controlled conditions vs 68 in the field CheckItOut Used to investigate task oriented performance under worst case ASR 30 to 70 depending on experiment Virtual Humans 37 in laboratory conditions 17 Examples of worst case recognizer noise S What book would you like U The Language of Sycamores ASR THE LANGUAGE OF IS A COMING WARS S Hi Scott welcome back U Not Scott Sarah Sarah Lopez ASR SCOTT SARAH SCOUT LAW 18 Error Propagation Recognizer noise injects uncertainty into the pipeline Information loss occurs when moving from an acoustic signal to a lexical representation Most SDSs ignore prosody amplitude emphasis Information provided to downstream components includes An n best list or word lattice Low level features speech rate speech energy 19 Spoken Language Understanding 20 SLU maps from words to concepts Dialog acts the overall intent of an utterance Domain specific concepts like a book or bus route Single utterances vs across turns Challenging in noisy settings Ex Does the library have Hitchhikers Guide to the Galaxy by Douglas Adams on audio cassette Dialog Act Book Request Title The Hitchhikers Guide to the Galaxy Author Douglas Adams Media Audio Cassette 21 Semantic grammars Domain Quit THANKS good bye THANKS goodbye THANKS bye independent concepts Yes No Help Repeat Number Domain specific concepts Book Author THANKS thanks VERY MUCH thank you VERY MUCH 22 VERY MUCH very much a lot Grammars generalize poorly Useful for extracting fine grained concepts but Hand engineered Time consuming to develop and tune Requires expert linguistic knowledge to construct Difficult to maintain over complex domains Lack robustness to OOV words novel phrasing Sensitive to recognizer noise 23 SLU in Olympus the Phoenix Parser Phoenix is a semantic parser indented to be robust to recognition noise Phoenix parses the incoming stream of recognition hypotheses Maps words in ASR hypotheses to semantic frames Each frame has an associated CFG Grammar specifying word patterns that match the slot Multiple parses may be produced for a single utterance The frame is forward to the next component in the pipeline 24 Statistical methods Supervised learning is commonly used for single utterance interpretation Given word sequence W find the semantic representation of meaning M that has maximum a posteriori probability P M W Useful for dialog act identification determining broad intent Like all supervised techniques Requires a training corpus Often is domain and recognizer dependent 25 Belief updating 26 Cross utterance SLU U Get my coffee cup and put it on my desk The one at the back Difficult in noisy settings Mostly new territory for SDS Zuckerman 2009 27 Dialogue Management 28 The Dialogue Manager Represents the system s agenda Many techniques Hierarchal plans state transaction tables Markov processes System initiative vs mixed initiative System initiative has less uncertainty about the dialog state but is clunky Required to manage uncertainty and error handing Belief updating domain independent error handling strategies 29 Task Specification Agenda and Execution Bohus 2007 30 Domain independent error handling Bohus 2007 31 Error
View Full Document
Unlocking...