Speech Processing 15-492/18-492Spoken Dialog SystemsSDS componentsSpoken Dialog SystemsMore than just ASR and TTSMore than just ASR and TTSRecognitionRecognitionParsingParsingManipulation of utterancesManipulation of utterancesGeneration of new informationGeneration of new informationText generationText generationSynthesisSynthesisSDS ArchitectureSDS InternalsParserParserFrom words to structureFrom words to structureDialog ManagerDialog ManagerState of dialog (who is talking)State of dialog (who is talking)Direction of dialog (what next)Direction of dialog (what next)References, user profile etcReferences, user profile etcInteraction of database/internetInteraction of database/internetLanguage GenerationLanguage GenerationFrom structure to wordsFrom structure to wordsParsingParsing of SPEECH not TEXTParsing of SPEECH not TEXTEh, I Eh, I wannawannago, go, wannawannago to Boston tomorrowgo to Boston tomorrowIf its not too much trouble I’d be very grateful if If its not too much trouble I’d be very grateful if one might be able to aid me in arranging my one might be able to aid me in arranging my travel arrangements to Boston, Logan airport, travel arrangements to Boston, Logan airport, at sometime tomorrow morning, thank you.at sometime tomorrow morning, thank you.Boston, tomorrowBoston, tomorrowParsing: Output structure““I I wannawannago to Boston, tomorrow”go to Boston, tomorrow”Destination: BOSDestination: BOSDeparture: 20081028, AMDeparture: 20081028, AMAirline: unspecifiedAirline: unspecifiedSpecial: unspecifiedSpecial: unspecifiedConvert speech to structureConvert speech to structureSufficient for further processing/querySufficient for further processing/queryPhoenix Parser7[Place](carnegie mellon university)(downtown)(robinson towne center)(the airport)(south hills junction)(mount oliver)(the south side)(oakland)(bloomfield)(polish hill)(the strip district)(the north side);[NextBus](*WHEN_IS *the next *BUS)(*WHEN_IS *the BUS after that *BUS)WHEN_IS(when is)(when's)BUS(bus)(one);Phoenix ParserParse what is importantParse what is importantIgnore other partsIgnore other partsMap know parts to usually informationMap know parts to usually informationParsing vs Language ModelLanguage ModelLanguage ModelModel what actually gets saysModel what actually gets saysParsing Parsing Extract the information you wantExtract the information you wantModels *can* be sharedModels *can* be sharedOnly accept things in the grammarOnly accept things in the grammarCan be over limitingCan be over limitingDialog ManagerMaintain stateMaintain stateWhere are we in the dialogWhere are we in the dialogWhose turn is itWhose turn is itWaiting for speakerWaiting for speakerWaiting for database query (stall user)Waiting for database query (stall user)Deal with bargeDeal with barge--ininLanguage GenerationQuery for flights to BostonQuery for flights to BostonTemplate fill Template fill answer(sanswer(s))The next flight to DEST leaves at The next flight to DEST leaves at DEPART_TIME arriving at ARRIVE_TIME.DEPART_TIME arriving at ARRIVE_TIME.Templates may be much more complexTemplates may be much more complexLanguage GenerationChoose which template to useChoose which template to useBased on state, answer typeBased on state, answer typeNatural variationNatural variationStatistical variationStatistical variationInclude <Include <ssmlssml> tags to help synthesis> tags to help synthesisCan <Can <emphemph>emphasize</>emphasize</emphemph> parts> partsCan identify dates, numbers etc.Can identify dates, numbers etc.Humans like variation in the outputHumans like variation in the outputIt is rare for a human to repeat things exactlyIt is rare for a human to repeat things exactlyLanguage GenerationFrames structures to (marked up) textFrames structures to (marked up) textSTART: PittsburghSTART: PittsburghEND: BostonEND: BostonDATE: 20081028DATE: 20081028TIME: 07:45TIME: 07:45FLIGHT: US075FLIGHT: US075Can generationCan generationI have US 075 leaving at 07:45 tomorrowI have US 075 leaving at 07:45 tomorrowUS Airways has a flight departing tomorrow at 07:45US Airways has a flight departing tomorrow at 07:45Standardized thingsHelpHelpUser should be able to get help at any timeUser should be able to get help at any timeExplain where they are and what they are Explain where they are and what they are expected to say (with explicit examples)expected to say (with explicit examples)ErrorsErrors“I didn’t understand” …“I didn’t understand” …ConfirmationConfirmationDid you say “Boston”?Did you say “Boston”?ConfirmationExplicit confirmation Explicit confirmation Where are you traveling to ?Where are you traveling to ?BostonBostonBoston, did I get that right?Boston, did I get that right?YesYesConfirmationImplicit confirmationImplicit confirmationWhere are you traveling to?Where are you traveling to?BostonBostonBoston, where …Boston, where …<can barge in><can barge in>ConfirmationExplicit confirmationExplicit confirmationSafe but slowSafe but slowImplicit confirmationImplicit confirmationNatural, but requires good support for bargeNatural, but requires good support for barge--in inGroundingShowing evidence the system understandsShowing evidence the system understandsWhere are you traveling to?Where are you traveling to?Boston.Boston.Right. Where ….Right. Where ….Boston, right. Where ….Boston, right. Where ….Designing PromptsConstrain your questions:Constrain your questions:How may I help you?How may I help you?Long story replyLong story replyWhat bus number would like schedules for?What bus number would like schedules for?Expect bus number repliesExpect bus number
View Full Document