Berkeley COMPSCI 294 - Statistical Natural Language Processing - D551042

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 294> Statistical Natural Language Processing

DOC PREVIEW

Berkeley COMPSCI 294 - Statistical Natural Language Processing

School name University of California, Berkeley

Course Compsci 294- Special Topics

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingSpeech RecognitionLecture 20: 11/22/05Slides directly from Dan Jurafsky, indirectly many othersSpeech Recognition Overview: Demo Phonetics Articulatory Acoustic Acoustic Models HMM Lexicons Gaussian Mixtures Speech Synthesis Proposal: Nov 23, 28: Recognition Nov 30, Dec 7: Project Presentations Dec 5: SynthesisASR for Dialog Systems Standard ASR maps sound to words But specific needs for dialogue systems Language models (what can be said) could depend on where we are in the dialogue Could make use of the fact that we are talking to the same human over time. Barge-in (human will talk over the computer) Confidence values: want to know if we misunderstood the human!State-of-the-Art: Recognition Accuracy measured by word error rate (WER) Speaker independent: Continuous digit strings, over the telephone: <0.3% Continuous dictation: 3-5% Continuous broadcast news: 5-7% Continuous multispeaker conversations over the telephone: 50%+ Commercials: 80%+ Speaker dependent: 30 min training, good microphone, dictation: 2-3%Databases Read speech (wideband, head- mounted mike) Resource Management (RM) 1000 word vocabulary, used in the 80s WSJ (Wall Street Journal) Reporters read the paper out loud “Verbalized punctuation” or “non-verbalized punctuation” Broadcast Speech (wideband) Broadcast News (“Hub 4”) English, Mandarin, Arabic Conversational Speech (telephone) Switchboard CallHome FisherNasal CavityPharynxVocal Folds (within the Larynx)TracheaLungsText copyright J. J. Ohala, Sept 2001, from Sharon Rose slideSagittal section of the vocal tract(Techmer 1880)2Places of articulationlabialdentalalveolarpost-alveolar/palatalvelaruvularpharyngeallaryngeal/glottalFigure thanks to Jennifer VendittiLabial placebilabiallabiodentalFigure thanks to Jennifer VendittiBilabial:p, b, mLabiodental:f, vCoronal placedentalalveolarpost-alveolar/palatalFigure thanks to Jennifer VendittiDental:th/dhAlveolar:t/d/s/z/lPost:sh/zh/yDorsal PlacevelaruvularpharyngealFigure thanks to Jennifer VendittiVelar:k/g/ngManner of Articulation Stop: complete closure of articulators, so no air escapes through mouth Oral stop: palate is raised, no air escapes through nose. Air pressure builds up behind closure, explodes when released p, t, k, b, d, g Nasal stop: oral closure, but palate is lowered, air escapes through nose. m, n, ngOral vs. Nasal Sounds Thanks to Jong-bok Kim for this figure!3VowelsIY AA UWFig. from Eric KellerSimple Period Waves (sine waves)Time (s)00.02–0.990.990• Characterized by:• period: T• amplitude A• phase φ• Fundamental frequencyin cycles per second, or Hz•F0=1/T1 cycleSimple periodic waves of soundTime (s)00.02–0.990.990•Y axis: Amplitude = amount of air pressure at that point in time•Zero is normal air pressure, negative is rarefaction•X axis: time. Frequency = number of cycles per second.• Frequency = 1/Period•20 cycles in .02 seconds = 1000 cycles/second = 1000 HzComplex waves: Adding a 100 Hz and 1000 Hz wave togetherTime (s)00.05–0.96540.990Spectrum1001000Frequency in HzAmplitudeFrequency components (100 and 1000 Hz) on x-axisSpectrum of one instant in an actual soundwave: many components across frequency rangeFrequency (Hz)05000020404Waveforms for speech Waveform of the vowel [iy] Frequency: repetitions/second of a wave Above vowel has 28 reps in .11 secs So freq is 28/.11 = 255 Hz This is speed that vocal folds move, hence voicing Amplitude: y axis: amount of air pressure at that point in time Zero is normal air pressure, negative is rarefactionShe just had a baby What can we learn from a wavefile? Vowels are voiced, long, loud Length in time = length in space in waveform picture Voicing: regular peaks in amplitude When stops closed: no peaks: silence. Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on Silence of stop closure (1.06 to 1.08 for first [b], or 1.26 to 1.28 for second [b]) Fricatives like [sh] intense irregular pattern; see .33 to .46Examples from LadefogedbadpadspatPart of [ae] waveform from “had” Note complex wave repeating nine times in figure Plus smaller waves which repeats 4 times for every large pattern Large wave has frequency of 250 Hz (9 times in .036 seconds) Small wave roughly 4 times this, or roughly 1000 Hz Two little tiny waves on top of peak of 1000 Hz wavesBack to Spectra Spectrum represents these freq components Computed by Fourier transform, algorithm which separates out each frequency component of wave.  x-axis shows frequency, y-axis shows magnitude (in decibels, a log measure of amplitude) Peaks at 930 Hz, 1860 Hz, and 3020 Hz.Why these Peaks?  Articulatory facts: The vocal cord vibrations create harmonics The mouth is an amplifier Depending on shape of mouth, some harmonics are amplified more than others5Deriving schwa: how shape of mouth (filter function) creates peaks! Reminder of basic facts about sound waves f = c/λ c = speed of sound (approx 35,000 cm/sec) A sound with λ=10 meters: f = 35 Hz (35,000/1000) A sound with λ=2 centimeters: f = 17,500 Hz (35,000/2)Resonances of the vocal tract The human vocal tract as an open tube Air in a tube of a given length will tend to vibrate at resonance frequency of tube.  Constraint: Pressure differential should be maximal at (closed) glottal end and minimal at (open) lip end.Closed endOpen endLength 17.5 cm.Figure from W. Barry Speech Science slidesFrom SundbergComputing the 3 Formants of Schwa Let the length of the tube be L F1= c/λ1= c/(4L) = 35,000/4*17.5 = 500Hz F2= c/λ2= c/(4/3L) = 3c/4L = 3*35,000/4*17.5 = 1500Hz F1= c/λ2= c/(4/5L) = 5c/4L = 5*35,000/4*17.5 = 2500Hz So we expect a neutral vowel to have 3 resonances at 500, 1500, and 2500 Hz These vowel resonances are called formantsFromMarkLiberman’sWeb siteSeeing formants: the spectrogram6American English Vowel SpaceFRONT BACKHIGHLOWeyowawoyayiyihehaeaaaouwuhahaxix uxFigure from Jennifer VendittiDialect Issues Speech varies from dialect to dialect (examples are American vs. British English) Syntactic (“I could” vs. “I could do”) Lexical (“elevator” vs. “lift”) Phonological (butter: [I©5] vs. [I©(]) Phonetic Mismatch between

View Full Document

Berkeley COMPSCI 294 - Statistical Natural Language Processing

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

Berkeley COMPSCI 294 - Statistical Natural Language Processing

Sign up for free to view:

Please select your school