CMU CS 15492 - Speech Recognition Template matching - D2195015

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15492> Speech Recognition Template matching

DOC PREVIEW

CMU CS 15492 - Speech Recognition Template matching

School name Carnegie Mellon University

Course Cs 15492- Special Topic: Speech Processing

Pages 24

This preview shows page 1-2-23-24 out of 24 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Speech Processing 15-492/18-492Speech RecognitionTemplate matchingSpeech Recognition by TemplatesA little history …A little history …Matching TemplatesMatching TemplatesDTW (Dynamic Time Warping)DTW (Dynamic Time Warping)Beyond template matchingBeyond template matchingRadio Rex (1922)• Toys always lead technology …• Call “Rex” and he comes out of his kennel• (Crystalradio.com and Rhys Jones)Toy ASR“Tricks”Radio RexRadio RexRecognizes vowel formants in “EH”Recognizes vowel formants in “EH”Voice activated toy trainVoice activated toy trainMultilingual stop/go Multilingual stop/go hashire/tomatehashire/tomateToys “pets” don’t need perfect ASRToys “pets” don’t need perfect ASRTemplate MatchingRecord templates from userRecord templates from userStore in libraryStore in libraryRecord ASR exampleRecord ASR exampleCompare against each library templateCompare against each library templateSelect closest exampleSelect closest exampleFor example …For example …On a voice dialing systemOn a voice dialing systemVoice Dialing System• Library– Mom– Dad– Bob– Mario’s Pizza– Let’s Go Bus Information SystemMatching in Time DomainDurationDurationWill discriminate some examplesWill discriminate some examplesBut Mom, Bob and Dad will be confusedBut Mom, Bob and Dad will be confusedWhat about spectral propertiesWhat about spectral propertiesMatching in Frequency DomainMomBobDifferent deliveriesWe change durationsWe change durationsTwo utterances are never the sameTwo utterances are never the sameWhen it fails we change our deliveryWhen it fails we change our deliveryBecome more Become more articulararticular“clearer”“clearer”Dynamic Time WarpingTemplateSample SpeechDTW algorithmFor each square For each square Dist(template[i],sample[jDist(template[i],sample[j]) +]) +smallest_ofsmallest_of(Dist(template[i(Dist(template[i--1],sample[j])1],sample[j])Dist(template[i],sample[jDist(template[i],sample[j--1])1])Dist(template[iDist(template[i--1],sample[j1],sample[j--1])1])Remember which choice your took (count path)Remember which choice your took (count path)TemplateSamplej-1 jii-1Multiple TemplatesCompare against eachCompare against eachFind closestFind closestNeed to normalize scoresNeed to normalize scores(divide by length of matches)(divide by length of matches)Matching TemplatesSampleTemplate LibraryWord0Word1Word2…For Word in TemplatesScore = dtw(Template[Word], Sample);if (Score < BestScore)BestWord = Word;DoAction(Action[BestWord])DTW issuesWhat happens with noWhat happens with no--matchesmatchesNeed to deal with none of the aboveNeed to deal with none of the aboveWhat happens with more templatesWhat happens with more templatesHarder to choose betweenHarder to choose betweenOnce variance greater than differencesOnce variance greater than differencesChoose templates that are very differentChoose templates that are very differentDTW/Template ApplicationsVoice dialerVoice dialerSimple command and controlSimple command and controlSpeaker IDSpeaker IDSpeaker IDSampleTemplate LibrarySpeaker0Speaker1Speaker2…For Speaker in TemplatesScore = dtw(Template[Speaker], Sample);if (Score < BestScore)BestSpeaker = Speaker;DTWAdvantagesAdvantagesWorks well for small number of templates (<20)Works well for small number of templates (<20)Language independentLanguage independentSpeaker specificSpeaker specificEasy to train (end user controls it)Easy to train (end user controls it)DisadvantagesDisadvantagesLimited number of templatesLimited number of templatesSpeaker specificSpeaker specificNeed actual training examplesNeed actual training examplesMore reliable matching• Distance metric– Euclidean • But some distances are bigger than others– Silence is pretty similar– Fricatives are quite larger• A longer fricative might give large score• A longer vowel might give smaller scoreMore reliable matching• Having multiple template examples– Individual matches or– Average them together• DTW align all of the examples• Collect statistics as a Gaussian– Mean and standard deviation for each coeffMore reliable distances• Instead of Euclidean distance– Doesn’t care about the standard deviation• Use Mahalanobis distance– Care about means and standard deviationExtending Template matchingString word templates togetherString word templates togetherNeed to find word segmentationNeed to find word segmentationBut there are many words …But there are many words …Word0Word1Word2…Extending template modelString phoneme templates togetherString phoneme templates togetherA template model for each phonemeA template model for each phonemek ae tSamplePhone0Phone1Phone2…Phoneme TemplatesSummarySpeech Recognition by TemplatesSpeech Recognition by TemplatesGood for simple small vocabulary tasksGood for simple small vocabulary tasksDynamic Time Warping (DTW)Dynamic Time Warping (DTW)Can match different durational examplesCan match different durational examplesAveraging over multiple modelsAveraging over multiple modelsDistance metricsDistance metricsEuclidean Euclidean

View Full Document