Berkeley COMPSCI 188 - Speech / Viterbi - D1989091

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 188> Speech / Viterbi

DOC PREVIEW

Berkeley COMPSCI 188 - Speech / Viterbi

School name University of California, Berkeley

Course Compsci 188- Introduction to Artificial Intelligence

Pages 35

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 188: Artificial Intelligence Fall 2006AnnouncementsHidden Markov ModelsSpeech RecognitionDigitizing SpeechSpeech in an HourShe just had a babySpectral AnalysisAdding 100 Hz + 1000 Hz WavesSpectrumBack to SpectraVowel FormantsResonances of the vocal tractSlide 15Why these Peaks?Slide 17How to read spectrogramsAcoustic Feature SequenceState SpaceHMMs for SpeechASR Lexicon: Markov ModelsMarkov Process with BigramsDecodingViterbi AlgorithmViterbi with 2 Words + Unif. LMNext ClassSlide 31The Speech Recognition ProblemExamples from LadefogedSimple Periodic Sound WavesDeriving SchwaSlide 36Computing the 3 Formants of SchwaHMMs for Continuous Observations?Viterbi DecodingCS 188: Artificial IntelligenceFall 2006Lecture 21: Speech / Viterbi11/09/2006Dan Klein – UC BerkeleyAnnouncementsOptional midtermOn Tuesday 11/21 in classReview session 11/19, 7-9pm, in 306 SodaProjects3.2 due 11/93.3 due 11/153.4 due 11/27ContestPacman contest details on web site this weekEntries due 12/3Hidden Markov ModelsHidden Markov models (HMMs)Underlying Markov chain over states XYou observe outputs (effects) E at each time stepAs a Bayes’ net:Several questions you can answer for HMMs:Last time: filtering to track belief about current X given evidenceX5X2E1X1X3X4E2E3E4E5Speech Recognition[demos]Digitizing SpeechSpeech in an HourSpeech input is an acoustic wave form s p ee ch l a bGraphs from Simon Arnfield’s web tutorial on speech, Sheffield:http://www.psyc.leeds.ac.uk/research/cogn/speech/tutorial/“l” to “a”transition:She just had a baby What can we learn from a wavefile?Vowels are voiced, long, loudLength in time = length in space in waveform pictureVoicing: regular peaks in amplitudeWhen stops closed: no peaks: silence.Peaks = voicing: .46 to .58 (vowel [i], from second .65 to .74 (vowel []) and so onSilence of stop closure (1.06 to 1.08 for first [b], or 1.26 to 1.28 for second [b])Fricatives like [] intense irregular pattern; see .33 to .46Frequency gives pitch; amplitude gives volumesampling at ~8 kHz phone, ~16 kHz mic (kHz=1000 cycles/sec)Fourier transform of wave displayed as a spectrogramdarkness indicates energy at each frequency s p ee ch l a bfrequencyamplitudeSpectral AnalysisAdding 100 Hz + 1000 Hz WavesTime (s)0 0.05–0.96540.990Spectrum1001000Frequency in HzAmplitudeFrequency components (100 and 1000 Hz) on x-axisBack to SpectraSpectrum represents these freq componentsComputed by Fourier transform, algorithm which separates out each frequency component of wave. x-axis shows frequency, y-axis shows magnitude (in decibels, a log measure of amplitude) Peaks at 930 Hz, 1860 Hz, and 3020 Hz.Vowel FormantsResonances of the vocal tractThe human vocal tract as an open tubeAir in a tube of a given length will tend to vibrate at resonance frequency of tube. Constraint: Pressure differential should be maximal at (closed) glottal end and minimal at (open) lip end.Closed endOpen endLength 17.5 cm.Figure from W. Barry Speech Science slidesFromMarkLiberman’swebsiteWhy these Peaks? Articulatory facts:Vocal cord vibrations create harmonicsThe mouth is a selective amplifierDepending on shape of mouth, some harmonics are amplified more than othersFigures from Ratree Wayland slides from his websiteVowel [i] sung at successively higher pitch. 1234567How to read spectrogramsbab: closure of lips lowers all formants: so rapid increase in all formants at beginning of "bab”dad: first formant increases, but F2 and F3 slight fallgag: F2 and F3 come together: this is a characteristic of velars. Formant transitions take longer in velars than in alveolars or labialsFrom Ladefoged “A Course in Phonetics”Acoustic Feature SequenceTime slices are translated into acoustic feature vectors (~39 real numbers per slice)These are the observations, now we need the hidden states Xfrequency……………………………………………..e12e13e14e15e16………..State SpaceP(E|X) encodes which acoustic vectors are appropriate for each phoneme (each kind of sound)P(X|X’) encodes how sounds can be strung together We will have one state for each sound in each wordFrom some state x, can only:Stay in the same state (e.g. speaking slowly)Move to the next position in the wordAt the end of the word, move to the start of the next wordWe build a little state graph for each word and chain them together to form our state space XHMMs for SpeechASR Lexicon: Markov ModelsMarkov Process with BigramsFigure from Huang et al page 618DecodingWhile there are some practical issues, finding the words given the acoustics is an HMM inference problemWe want to know which state sequence x1:T is most likely given the evidence e1:T:Viterbi AlgorithmQuestion: what is the most likely state sequence given the observations?Slow answer: enumerate all possibilitiesBetter answer: cached incremental versionViterbi with 2 Words + Unif. LMFigure from Huang et al page 612Next ClassFinal part of the course: machine learningWe’ll start talking about how to learn model parameters (like probabilities) from dataOne of the most heavily used technologies in all of AIThe Speech Recognition ProblemWe want to predict a sentence given an acoustic sequence:The noisy channel approach:Build a generative model of production (encoding)To decode, we use Bayes’ rule to writeNow, we have to find a sentence maximizing this productWhy is this progress?)|(maxarg* AsPss)|()(),( sAPsPsAP )|(maxarg* AsPss)(/)|()(maxarg APsAPsPs)|()(maxarg sAPsPsExamples from LadefogedbadpadspatSimple Periodic Sound WavesTime (s)0 0.02–0.990.990Y axis: Amplitude = amount of air pressure at that point in timeZero is normal air pressure, negative is rarefactionX axis: time. Frequency = number of cycles per second.Frequency = 1/Period20 cycles in .02 seconds = 1000 cycles/second = 1000 HzDeriving SchwaReminder of basic facts about sound wavesf = c/c = speed of sound (approx 35,000 cm/sec)A sound with =10 meters: f = 35 Hz (35,000/1000)A sound with =2 centimeters: f = 17,500 Hz (35,000/2)From SundbergComputing the 3 Formants of SchwaLet the length of the tube be LF1 = c/1 = c/(4L) = 35,000/4*17.5 = 500HzF2 = c/2 = c/(4/3L) = 3c/4L =

View Full Document