Unformatted text preview:

Spoken Dialog System Architecture Joshua Gordon CS4706 Outline Motivation current trends in SDS Conversational Speech Interface Architecture An end to end tour of the Olympus SDS Architecture Recognition considerations Spoken language understanding techniques Dialog management error handling belief updating Language generation speech synthesis Interaction management turn taking Information Seeking Transaction Based Spoken Dialog Systems Many of today s systems are designed for database access and call routing Columbia CheckItOut virtual librarian CMU Let s Go Pittsburg bus schedules Google Goog411 directory assistance Google Voice Search MIT Jupiter weather information Nuance built to order technical support Speech Aware Kiosks SDS architectures are beginning to incorporate multimodal input How may I help you I can provide directory assistance and directions around campus Speech Interfaces to Virtual Characters SDS architectures are exploring multimodal output including gesturing and facial expression to indicate level of understanding Negotiate an agreement between soldiers and village elders Both auditory and visual cues used in turn taking Prosody facial expressions convey emotion SGT Blackwell http ict usc edu projects sergeant blackwell Speech Interfaces to Robotic Systems Next generation systems explore ambitious domains www cellbots com User Fly to the red house and photograph the area System OK I am preparing to take off Speech Aware Appliances Speech aware appliances are beginning to engage in limited dialogs Interactive dialogs disambiguation are required by multi field queries ambiguity in results Expected What user actually said Play artist Glenn Miller Glenn Miller jazz Play song all rise All rise I guess from blues How does all of this work There s more to conversation than we realize An ocean of difference remains between Human Human and Human Machine Dialog Recognition performance often seen as the limiting factor but fundamental challenges exist in all areas Turn taking via subtle auditory cues ever listened to two speakers competing for the conversational floor Grounding via prosody intonation contours Indicating level of understanding by answering a question with a question Mapping speech to concepts requires knowledge of the world SDS are subject to limited domain knowledge Lack ability to effectively communicate their capabilities and limitations Running example SDS Architecture for a Virtual Librarian The Andrew Heiskell Braille and Talking Book Library Patrons will browse order books by phone Heiskell s bibliographic holdings include 70 000 books Challenge many callers have relatively disfluent speech Poor recognizer performance is anticipated What are the components we ll need Introducing the Olympus Architecture a freely available open source collection of dialog system components published by CMU Origins in the earlier Communicator project The Olympus Architecture Pipeline format subsequent layers increase abstraction Signals to words words to concepts concepts to actions Detail Hub Architecture Deployed Olympus Systems System Domain Users Interaction Vocab Lets Go Public Pittsburg Bus General public Route Information Information 2000 words access system initiative background noise Team Talk Robot Coordination and Control Treasure hunting Grad students researchers Multi participant command and control 500 words CheckItOut Virtual Librarian for the Andrew Heiskell Library Elderly vision impaired library patrons Information access mixed initiative disfluent speech Variable 10 000 words Part 1 Speech recognition From signals to words managing uncertainty Information provided to downstream components A lexical representation of the speech signal with acoustic confidence and language model fit scores An N best list But How you say it often conveys as much information as what is said Prosody intonation amplitude duration Moving from an acoustic signal to a lexical representation already implies loss of information SDS architectures always operate on partial information Managing that uncertainty is one of the main design challenges Why ASR is Difficult for SDS A SDS must accommodate variability in Calling environments background noise cell phone interference VOIP Speech production disfluency false starts filled pauses repeats corrections accent age gender differences between human human and human machine speech Technological familiarity with dialog systems in general with a particular SDS s capabilities and constraints callers often use OOV out of domain concepts The Sphinx Open Source Recognition Toolkit Pocket sphinx vs Sphinx III ps is efficient enough to run on embedded devices Continuous speech speaker independent recognition system Includes tools for language model compilation pronunciation and acoustic model adaptation Provides word level confidence annotation n best lists Olympus supports parallel decoding engines models Typically separate models are run for male and female speech the best fit hypothesis is selected http cmusphinx sourceforge net Language Acoustic Models for SDS Sphinx supports statistical class and state based language models Statistical language models assign n gram probabilities to word sequences Class based models assign probabilities to collections of terminals e g I would like to read book State based LM switching SDS limit the perplexity of the language model by constraining it to the anticipated words confirmation rejection help address Acoustic Models for SDS Olympus includes permissive license WSJ Acoustic models read speech for male and female speech at 8khz and 16hkz bandwidth Tools for acoustic adaptation Support permissive license models Part 2 Spoken Language Understanding From words to concepts Spoken Language Understanding is the task of extracting meaning from utterances Dialog acts the overall intent of an utterance Domain specific concepts frame slots Very difficult under noisy conditions Does the library have Hitchhikers Guide to the Galaxy by Douglas Adams on audio cassette Dialog Act Book Request Title The Hitchhikers Guide to the Galaxy Author Douglas Adams Media Audio Cassette SLU Challenges faced by SDS Recognizer error background noise resulting in indels insertions substitutions deletions word boundary detection problems Language production phenomena disfluency false starts corrections repairs are difficult to parse Meaning must often be assembled from multiple speaker turns There are many many possible


View Full Document

Columbia CS 4706 - Motivation: current trends in SDS

Loading Unlocking...
Login

Join to view Motivation: current trends in SDS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Motivation: current trends in SDS and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?