Speech Processing 15-492/18-492Speech RecognitionIntroAcoustic modellingHMMsSpeech RecognitionFrom acoustics to textFrom acoustics to textAcoustic modelingAcoustic modelingRecognizing all forms of all phonemesRecognizing all forms of all phonemesLanguage modelingLanguage modelingExpectation of what might be saidExpectation of what might be saidWe need both to do recognitionWe need both to do recognitionAcoustics are not enoughLast Saturday in Hawaii, numerous Last Saturday in Hawaii, numerous WaipouliWaipoulivacationers were vacationers were shocked to find their beach cordoned off for a UC Berkeley Dramashocked to find their beach cordoned off for a UC Berkeley Dramaenactment of "Personal office space". The play features exclusivenactment of "Personal office space". The play features exclusively ely topless men and women in an everyday office environment. Richardtopless men and women in an everyday office environment. RichardCarlson, one of the annoyed tourists and a regular swimmer at Carlson, one of the annoyed tourists and a regular swimmer at WaipouliWaipoulibeach, complained that they really knew how to wreck a nice beach, complained that they really knew how to wreck a nice beach with the nudist play. Many of the tourists appeared rufflebeach with the nudist play. Many of the tourists appeared ruffled by the d by the content and fled the scene to avoid compromising photos.content and fled the scene to avoid compromising photos.In yesterday's press release, AT&T unveiled In yesterday's press release, AT&T unveiled SpeechKitSpeechKit, its new , its new speech recognition toolkit. According to Michael Armstrong, the speech recognition toolkit. According to Michael Armstrong, the COO COO of the company, the most innovative feature of the system is itsof the company, the most innovative feature of the system is itsrevolutionary threerevolutionary three--dimensional interface, which opens a new universe dimensional interface, which opens a new universe of possibilities for the speech recognition community. During tof possibilities for the speech recognition community. During the he official software release, Jonathan Blues, a senior researcher aofficial software release, Jonathan Blues, a senior researcher at AT&T t AT&T Labs, explained how to recognize speech with the new display, anLabs, explained how to recognize speech with the new display, and d how the toolkit has already played a crucial role in his researchow the toolkit has already played a crucial role in his research.h.Acoustics are not enoughLast Saturday in Hawaii, numerous Last Saturday in Hawaii, numerous WaipouliWaipoulivacationers were vacationers were shocked to find their beach cordoned off for a UC Berkeley Dramashocked to find their beach cordoned off for a UC Berkeley Dramaenactment of "Personal office space". The play features exclusivenactment of "Personal office space". The play features exclusively ely topless men and women in an everyday office environment. Richardtopless men and women in an everyday office environment. RichardCarlson, one of the annoyed tourists and a regular swimmer at Carlson, one of the annoyed tourists and a regular swimmer at WaipouliWaipoulibeach, complained that they really knew beach, complained that they really knew how to wreck a nice how to wreck a nice beach with this nudist playbeach with this nudist play. Many of the tourists appeared ruffled by . Many of the tourists appeared ruffled by the content and fled the scene to avoid compromising photos.the content and fled the scene to avoid compromising photos.In yesterday's press release, AT&T unveiled In yesterday's press release, AT&T unveiled SpeechKitSpeechKit, its new , its new speech recognition toolkit. According to Michael Armstrong, the speech recognition toolkit. According to Michael Armstrong, the COO COO of the company, the most innovative feature of the system is itsof the company, the most innovative feature of the system is itsrevolutionary threerevolutionary three--dimensional interface, which opens a new universe dimensional interface, which opens a new universe of possibilities for the speech recognition community. During tof possibilities for the speech recognition community. During the he official software release, Jonathan Blues, a senior researcher aofficial software release, Jonathan Blues, a senior researcher at AT&T t AT&T Labs, explained Labs, explained how to recognize speech with this new displayhow to recognize speech with this new display, and , and how the toolkit has already played a crucial role in his researchow the toolkit has already played a crucial role in his research.h.Split the taskBuild Acoustic modelsBuild Acoustic modelsProbability of phones given acousticsProbability of phones given acousticsBuild Language modelsBuild Language modelsProbability of word stringProbability of word stringAcoustic modelsRepresent all ways to say each phonemeRepresent all ways to say each phonemeLike “templates” for each phonemeLike “templates” for each phonemeAverages over multiple examplesAverages over multiple examplesDifferent phonetic contextsDifferent phonetic contexts“sow” “sow” vsvs“see” etc“see” etcDifferent people speakingDifferent people speakingDifferent acoustic environmentDifferent acoustic environmentDifferent channels Different channels (assume channel is similar)(assume channel is similar)Better Acoustic ModelsDTW TemplateDTW TemplateCould be averages over multiple examplesCould be averages over multiple examplesNeed to be time normalizedNeed to be time normalizedLinear interpolate or try to matchLinear interpolate or try to matchMatching probabilisticallyMatching probabilisticallyWhat is the probability that example matchesWhat is the probability that example matchesTest each frameTest each frameHidden Markov Models• Markov Process– Future can be predicted from the past• Hidden Markov Models:– When the state is unknown– A probability is given for each statesHidden Markov ModelKey RequirementsFind Probability of ObservationGiven observation O and model MGiven observation O and model MEfficiently file P(O|M)Efficiently file P(O|M)Called Called decodingdecodingFind sum of all paths probabilitiesFind sum of all paths probabilitiesEach path Each
View Full Document