Berkeley ELENG 225D - Lecture Notes - D2114377

Home> Schools> University of California, Berkeley> Electrical Engineering (ELENG) > ELENG 225D> Lecture Notes

Berkeley ELENG 225D - Lecture Notes

School name University of California, Berkeley

Course Eleng 225d- Audio Signal Processing in Humans and Machines

Pages 15

Download Save

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Why is ASR Hard?Why is ASR Hard?•Natural speech is continuous•Natural speech has disfluencies•Natural speech is variable over:global rate, local rate, pronunciationwithin speaker, pronunciation acrossspeakers, phonemes in differentcontextsWhy is ASR Hard?Why is ASR Hard?(continued)(continued)•Large vocabularies are confusable•Out of vocabulary words inevitable•Recorded speech is variable over:room acoustics, channel characteristics,background noise•Large training times are not practical•User expectations are for equal to orgreater than “human performance”Main Causes of Main Causes of Speech VariabilitySpeech VariabilityEnvironmentSpeakerInputEquipment Speech - correlated noisereverberation, reflectionUncorrelated noiseadditive noise(stationary, nonstationary) Attributes of speakersdialect, gender, age Manner of speakingbreath & lip noisestressLombard effectratelevelpitchcooperativenessMicrophone (Transmitter)Distance from microphoneFilterTransmission systemdistortion, noise, echoRecording equipmentASR DimensionsASR Dimensions•Speaker dependent, independent•Isolated, continuous, keywords•Lexicon size and difficulty•Task constraints, perplexity•Adverse or easy conditions•Natural or read speechTelephone SpeechTelephone Speech•Limited bandwidth (F vs S)•Large speaker variability•Large noise variability•Channel distortion •Different handset microphones•Mobile and handsfree acousticsAutomatic Speech Automatic Speech RecognitionRecognitionData CollectionPre-processingFeature ExtractionHypothesis GenerationCost EstimatorDecodingPre-processingPre-processingRoomAcousticsSpeechMicrophoneLinearFilteringSampling &DigitizationIssue: Effect on modelingFeature ExtractionFeature ExtractionSpectralAnalysisAuditoryModel/NormalizationsIssue: Design for discriminationRepresentations Representations are Importantare ImportantNetwork23% frame correctNetwork70% frame correctSpeech waveformPLP featuresHypothesis GenerationHypothesis GenerationIssue: models of language and taskcatdoga dog is not a cata cat not is adogCost EstimationCost Estimation•Distances•-Log probabilities, from discrete distributions Gaussians, mixtures neural networksDecodingDecodingPronunciation ModelsPronunciation ModelsLanguage ModelsLanguage ModelsMost likely words for largest productP(acousticswords) - P(words)P(words) =  P(wordshistory)•bigram, history is previous word•trigram, history is previous 2 words•n-gram, history is previous n-1 wordsSystem ArchitectureSystem ArchitecturePronunciationLexiconSignal ProcessingProbabilityEstimatorDecoderRecognizedWords“zero”“three”“two”Probabilities“z” -0.81“th” = 0.15“t” =

View Full Document


School:
Email:
New Password:
Confirm Password:

Berkeley ELENG 225D - Lecture Notes

Sign up for free to view:

Please select your school