Speech Processing 15-492/18-492Speech RecognitionGrammarsOther ASR techniquesBut not just acoustics• But not all phones are equi-probable• Find word sequences that maximizes• Using Bayes’ Law• Combine models– Us HMMs to provide– Use language model to provideBeyond n-gramsTriTri--gram languages modelsgram languages modelsGood for general ASRGood for general ASRMore targeted models for dialog systemsMore targeted models for dialog systemsLook for more structureLook for more structureFormal Language TheoryChomsky HierarchyChomsky HierarchyFinite State MachinesFinite State MachinesContext Free GrammarsContext Free GrammarsContext Sensitive GrammarsContext Sensitive GrammarsGeneralized Rewrite Rules/Turing machinesGeneralized Rewrite Rules/Turing machinesAs LM or as Understanding mechanismAs LM or as Understanding mechanismFolded into the ASR or only ran on outputFolded into the ASR or only ran on outputFinite State MachinesTrigram is a word^2 FSMTrigram is a word^2 FSMFSM for greetingFSM for greetingHelloGoodMorningAfternoonFinite State GrammarSentences Sentences --> Start Greeting End> Start Greeting EndGreeting Greeting --> “Hello”> “Hello”Greeting Greeting --> “Good” TOD> “Good” TODTOD TOD --> Morning> MorningTOD TOD --> Afternoon> AfternoonContext Free GrammarX X --> Y Z> Y ZY Y --> “Terminal”> “Terminal”Y Y --> > NonTerminalNonTerminalNonTerminalNonTerminalJSGFSimple grammar formalism for ASRSimple grammar formalism for ASRStandard for writing ASR grammarsStandard for writing ASR grammarsActually finite stateActually finite statehttp://www.w3.org/TR/jsgfhttp://www.w3.org/TR/jsgfFinite State MachinesFinite State Machines:Finite State Machines:DeterministicDeterministicEach arc leaving a state has unique labelEach arc leaving a state has unique labelThere always exists a Deterministic machine There always exists a Deterministic machine representing a nonrepresenting a non--Deterministic oneDeterministic oneMiniminalMiniminalThere exists an FSM with less (or equal) states that There exists an FSM with less (or equal) states that accepts the same languageaccepts the same languageProbabilistic FSMsEach arc has a label and a probabilityEach arc has a label and a probabilityCollect probabilities from dataCollect probabilities from dataCan do smoothing like Can do smoothing like ngramsngramsNatural Language ProcessingProbably mildly context sensitiveProbably mildly context sensitivei.e. you need context sensitive rulesi.e. you need context sensitive rulesBut if we only accept context freeBut if we only accept context freeProbably OKProbably OKIf we only accept finite stateIf we only accept finite stateProbably OK too Probably OK tooWriting Grammars for SpeechWhat do people say?What do people say?No what do people *really* say!No what do people *really* say!Write examplesWrite examplesPlease, I’d like a flight to BostonPlease, I’d like a flight to BostonI want to fly to BostonI want to fly to BostonWhat do you have going to BostonWhat do you have going to BostonWhat about BostonWhat about BostonBostonBostonWrite rules grouping things togetherWrite rules grouping things togetherIgnore the unimportant thingsI’m terribly sorry but I would greatly I’m terribly sorry but I would greatly appreciate if you might be able to help me appreciate if you might be able to help me find an acceptable find an acceptable flight to Bostonflight to Boston..I, I I, I wannawannawant to go to want to go to ehmehmBoston.Boston.What do people really sayA: see who else will somebody else important all the A: see who else will somebody else important all the {mumble} the whole school are out for a week{mumble} the whole school are out for a weekB: oh reallyB: oh reallyA: {A: {lipsmacklipsmack} {breath} yeah} {breath} yeahB: okay {breath} well when are you going to come up thenB: okay {breath} well when are you going to come up thenA: um let’s see well I guess I I could come up actually A: um let’s see well I guess I I could come up actually anytimeanytimeB: okay well how about nowB: okay well how about nowA: nowA: nowB: yeah B: yeah A: have to work tonight A: have to work tonight ––laughlaugh--Class based language modelsConflate all words in same classConflate all words in same classCities, Names, numbers etcCities, Names, numbers etcCan be automatic or designedCan be automatic or designedAdaptive Language ModelsUpdate with new News storiesUpdate with new News storiesUpdate your language model every dayUpdate your language model every dayUpdate your language model with daily useUpdate your language model with daily useUsing user generated data (if ASR is good)Using user generated data (if ASR is good)Combining modelsUse “background” modelUse “background” modelGeneral triGeneral tri--gram modelgram modelUse specific modelUse specific modelGrammar based Grammar based Very localizedVery localizedCombineCombineInterpolated (just a weight factor)Interpolated (just a weight factor)More elaborate combinationsMore elaborate combinationsMaximum entropy modelsMaximum entropy modelsVocabulary sizeCommand and controlCommand and control< 100 words, grammar based< 100 words, grammar basedSimple dialogSimple dialog< 1000 words, grammar/tri< 1000 words, grammar/tri--gramgramComplex dialogComplex dialog< 10K words, tri< 10K words, tri--gram (some grammar for control)gram (some grammar for control)DictationDictation< 64K words, tri< 64K words, tri--gramgramBroadcast NewsBroadcast News256K plus, tri256K plus, tri--gram (and lots of other possibilitiesgram (and lots of other possibilitiesHomework 1Build a speech recognition systemBuild a speech recognition systemAn acoustic modelAn acoustic modelA pronunciation lexiconA pronunciation lexiconA language modelA language modelNote it takes time to buildNote it takes time to buildWhat is your initial WERWhat is your initial WERHow did you improve itHow did you improve
View Full Document