CMU CS 15492 - Speech Recognition Acoustic modeling Pronunciation dictionary (28 pages)

Previewing pages 1, 2, 3, 26, 27, 28 of 28 page document View the full content.
View Full Document

Speech Recognition Acoustic modeling Pronunciation dictionary



Previewing pages 1, 2, 3, 26, 27, 28 of actual document.

View the full content.
View Full Document
View Full Document

Speech Recognition Acoustic modeling Pronunciation dictionary

90 views


Pages:
28
School:
Carnegie Mellon University
Course:
Cs 15492 - Special Topic: Speech Processing

Unformatted text preview:

Speech Processing 15 492 18 492 Speech Recognition Acoustic modeling Pronunciation dictionary Acoustic Modeling Speech and Signal Variability Measuring Error Pronunciation lexicons Variability in Speech Signal Mr Wright should write to Ms Wright right away about his Ford or four door Honda Homophones same pronunciation wright right write r ay t ford or four door f ao r d ao r Style Variability Different articulation in different situations Clear vs Conversational Whisper vs shouting Talking to machine talking to others Frustrated speech Speaker variability Gender age dialect health Speaker dependent systems Speaker independent systems Speaker adaptive systems Enrolment stage acoustics and language Environment Variability Different background noises Office vs Outside Different applications different environments Desktop dictation to Warehouse pick Single speaker vs Multispeaker Background music Channel Variability Telephone vs Desktop 8KHz vs 16KHz PDA vs Desktop Close talking vs far field Cell Phone vs Landline Measuring Speech Recognition Error Word Error Rate Substitutions word is replaced Deletions word is missed out Insertions word is added Subs Dels Ins WER 100 x word in correct sentence Word Error Rate WER requires Transcription the correct word string Alignment between ASR output and Transcript Not just left to right matching Sometimes Accuracy is given 100 WER NOT number of words correct Word Error Rate Can get 100 But something is very wrong Outputting Sometimes gives WER 100 All the only ignoring the speech words are treated equal This specimen vs The specimen Is absent vs Is present Signal Acquisition High quality signal quality Lower sample rate will increase WER 8KHz baseline 16KHz 10 End Point Detection Long It will recognize phantom words Need silence will likely increase WER to find the speech in the signal VAD Voice Activity Detection Find beginning and end of speech Typically do continuous recognition Recognized while listening But need end point



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Speech Recognition Acoustic modeling Pronunciation dictionary and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Speech Recognition Acoustic modeling Pronunciation dictionary and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?