Speech Processing 15-492/18-492Spoken Dialog SystemsTree based dialogsVoiceXMLState-based DialogsSimple stateSimple state--based dialog systemsbased dialog systemsGet NameGet NameGet Account numberGet Account numberGet PinGet PinPresent balancePresent balanceGo back to start or exitGo back to start or exitState-based DialogsGet Name:Get Name:What is your name?What is your name?ASR NameASR NameMay be correct (in the database)May be correct (in the database)May be unknown (not in database)May be unknown (not in database)May not be name (What do I say?/Help/Repeat)May not be name (What do I say?/Help/Repeat)Should you echo the recognized name?Should you echo the recognized name?Confirmation (or not)Confirmation (or not)State-based dialogGet nameGet nameCheck in databaseCheck in databaseAsk again if notAsk again if notDeal with helpDeal with helpGet account numberGet account numberCheck in database (with name)Check in database (with name)Confirm account number and nameConfirm account number and nameFor securityFor securityState-based InteractionTrees can get very largeTrees can get very largeUser can get lost easilyUser can get lost easilyYou want to minimize the number of turnsYou want to minimize the number of turnsFaster throughput means more callsFaster throughput means more callsFaster throughput means happier customerFaster throughput means happier customerThe level of helpFirst time users *need* a successful callFirst time users *need* a successful callOtherwise, they wont call backOtherwise, they wont call backHaving very helpful prompts is goodHaving very helpful prompts is goodAt start, gets annoying quicklyAt start, gets annoying quicklyDesigning prompts is a craftDesigning prompts is a craftWhat should say that is understoodWhat should say that is understoodHow much should you tailor it to the userHow much should you tailor it to the userVoiceXMLA W3C standard for voice browsingA W3C standard for voice browsingXML based “programming” language for XML based “programming” language for speechspeechOutput synthesized (and recorded) speechOutput synthesized (and recorded) speechRecognition of speech and DTMFRecognition of speech and DTMFRecording of spoken inputRecording of spoken inputTelephony featuresTelephony featuresVoiceXMLASRASRFrom Grammars (JSGF)From Grammars (JSGF)From triFrom tri--gramsgramsFrom “Domain Managers” From “Domain Managers” Credit card numbersCredit card numbersCity, StatsCity, StatsVoiceXMLTTSTTS<<ssmlssml> markup> markupChoice of voiceChoice of voiceChoice of languageChoice of languageChoice of how to pronounce thingsChoice of how to pronounce thingsSpecify breaks, timing emphasisSpecify breaks, timing emphasisStructure<<vxmlvxmlversion="1.0">version="1.0"><meta name="author" content="John Doe"/><meta name="author" content="John Doe"/><<varvarname="hi" name="hi" exprexpr="'Hello World!'"/>="'Hello World!'"/><form><form><block><block><value <value exprexpr="hi"/>="hi"/><<gotogotonext="#next="#say_goodbyesay_goodbye"/>"/></block></block></form></form><form id="<form id="say_goodbyesay_goodbye">"><block><block>Goodbye!Goodbye!</block></block></form></form></</vxmlvxml>>Basic Tags<form id=“<form id=“xxxxxxxx”>”><<gotogotonext=“#xxx”>next=“#xxx”><field> gather info from user through <field> gather info from user through speech or DTMFspeech or DTMF<record> <record> recordrecorddata userdata user<<subdialogsubdialog> performs some sub dialog> performs some sub dialog<field> tag<form id=“<form id=“getBusNumbergetBusNumber”>”><field name=“<field name=“BusNumberBusNumber”>”><prompt>Which bus line do you want?</prompt><prompt>Which bus line do you want?</prompt><grammar <grammar srcsrc=“grams/=“grams/bus.grambus.gram”>”><help> Please say you desired bus number, e.g. <help> Please say you desired bus number, e.g. 61C</help>61C</help></field></field></form></form>Flow of ControlGotoGoto<<gotogotonext=“#next=“#GetBusNumberGetBusNumber>><<gotogotonext=“next=“Trains.vxmlTrains.vxml”>”><if <if condcond=“=“BusNumberBusNumber== ‘501”>== ‘501”><prompt> Sorry that bus no longer runs</prompt><prompt> Sorry that bus no longer runs</prompt><<elseifelseifcondcond=“=“BusNumberBusNumber== ’56U”>== ’56U”><prompt> Sorry it’ll be a long wait </prompt><prompt> Sorry it’ll be a long wait </prompt><else /><else /><prompt> One will be along shortly </prompt><prompt> One will be along shortly </prompt></if></if>Variables<<varvarname=“var1” name=“var1” exprexpr=“hello”>=“hello”><prompt I just wanted to say <value <prompt I just wanted to say <value exprexpr=“var1”> </prompt>=“var1”> </prompt><assign name=“var1” <assign name=“var1” exprexpr=“goodbye”>=“goodbye”>Recognition GrammarsSpeech Recognition Grammar SpecificationSpeech Recognition Grammar Specification(SRGS)(SRGS)Augmented BNFAugmented BNF$order = I would like a $drink$order = I would like a $drink$drink = coke | $drink = coke | pepsipepsi| | mountain_dewmountain_dewVoiceXML BrowsersCompatibilityCompatibilityNot as compatible as one would likeNot as compatible as one would like<objects> can be different (but useful)<objects> can be different (but useful)City, State recognizersCity, State recognizersECMAscriptECMAscript((JavascriptJavascript))Beyond VoiceXML(in VoiceXML)Mixing html/Mixing html/cgicgiscripts in scripts in VoiceXMLVoiceXMLUse Use phpphpto generate to generate VoiceXMLVoiceXMLfilesfilesUse Use urlsurls(with ?...) to calculate/get data(with ?...) to calculate/get datahttp://weather.com?zip=“15213http://weather.com?zip=“15213””Use Use urlsurlsto get waveformsto get waveformshttp://tts.com?text=“Hellohttp://tts.com?text=“HelloWorld”World”VoiceXML futureNN--gram grammar Markup Languagegram grammar Markup LanguageMany browsers hove own extensionsMany browsers hove own extensionsPronunciation Lexicon Markup LanguagePronunciation Lexicon Markup LanguageA way to add new items to the lexiconA way to add new items to
View Full Document