Speech RecognitionMIT 6.893SMA 5508Spring 2004Larry Rudolph (MIT)6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised ShortlyA long term goalSince 1950, AI researchers claimedCrucial problemWill be solved within the decadeFinally, it appears trueFailure rates still too high90% hit rate is 10% error ratewant 98% or 99% success rate6.893 Spring 2004: User Interface Larry RudolphSpectrum of choicesConstrained DomainUnconstrained DomainSpeaker DependentVoice tags (e.g. phone)Trained Dictation (Viavoice)Speaker IndependentGalaxy(we are here)What everyone wants6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised ShortlyWaveform to PhonemesWaveform is very fuzzyWe think there is a large break between words and sentenceshard to see from waveformMapping waveform segments to phonemes is not accurate6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised ShortlyPhonemes to wordsGroup phonemes into wordsnot always 1-1 mappingmissing phonemesfalse phonemes (extra ones)accentsmany possible choicesWord should be known to systemdomain or dictionary6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised Shortlywords to sentencesPeople do not always speak grammatically correctsome invariant rules (for speech)extra or missing wordsphrases not always sentencesEasier when sentence is in domaindomain specified by grammar6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised Shortlysentences into meaningDictation system: want sentencesOther system: want to understandIntegrate high-level processingMost applications need it anywayHelps with recognitionuseful to disambiguate input6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised Shortlymeaning into actionWhat happens after meaning?Respond to user (even a beep)Usually generate more substantial responseAction should be valid in context6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised ShortlyDisambiguationEach transformation is rarely highly accurateLots of choicesSubsequent steps can rule out choices from previous steps6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised Shortlydisambiguation strategySelect “n-best” choices and pass onEach step restricts possible meaningMake heavy use of probabilityViterby searchstate transitions along with probabilities. push through n choices at once6.893 5508 Spring 2004: Speech Recognition Larry RudolphDRAFT -- To Be Revised Shortlyafter domain dependentHandling out-of-vocabulary wordsMultimodal inputimprove recognition ratese.g. lip readingsometimes easier to point than
View Full Document