DOC PREVIEW
Berkeley COMPSCI 294 - Lecture Notes

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingSpeech SynthesisLecture 22: 12/4/05Slides directly from Dan Jurafsky, indirectly many othersModern TTS systems 1960’s first full TTS Umeda et al (1968) 1970’s Joe Olive 1977 concatenation of linear-prediction diphones Speak and Spell 1980’s 1979 MIT MITalk (Allen, Hunnicut, Klatt) 1990’s-prese n t Diphone synthesis Unit selection synthesisTypes of Modern Synthesis Articulatory Synthesis: Model movements of articulators and acoustics of vocal tract Formant Synthesis: Start with acoustics, create rules/filters to create each formant Concatenative Synthesis: Use databases of stored speech to assemble new utterances.Text from Richard Sproat slidesTTS Demos (Mostly Unit-Selection) Comparisons: http://www.tmaa.com/tts/companies.htm ATT: http://www.naturalvoices.att.com/demos/ Rhetorical (= Scansoft) http://www.rhetorical.com/cgi-bin/demo.cgi Festival http://www-2.cs.cmu.edu/~awb/festival_demos/index.html IBM http://www-306.ibm.com/software/pervasive/tech/demos/tts.shtmlTTS ArchitecturePhonetic AnalysisDictionary LookupGrapheme-to-Phoneme (LTS)Text AnalysisText NormalizationPart-of-Speech taggingHomonym DisambiguationProsodic AnalysisBoundary placementPitch accent assignmentDuration computationWaveform synthesisRawText inSpeech outText Normalization Analysis of raw text into pronounceable words Sample problems: He stole $100 million from the bank It's 13 St. Andrews St. The home page is http://www.cnn.com yes, see you the following tues, that's 11/12/01 Steps Identify tokens in text Chunk tokens into reasonably sized sections Map tokens to words Identify types for words2Words to Phones Two methods: Dictionary-based Rule-based (Letter-to-sound=LTS) Early systems, all LTS MITalk was radical in having huge 10K word dictionary Now systems use a combination Big dictionary Special code for handling names Machine learned LTS system for other unknown words CMU dictionary: 127K words http://www.speech.cs.cmu.edu/cgi-bin/cmudictLetter-to-Sound Rules Festival LTS rules:(LEFTCONTEXT [ ITEMS] RIGHTCONTEXT = NEWITEMS ) Examples: ( # [ c h ] C = k ) ( # [ c h ] = ch ) Rules apply in order “christmas” pronounced with [k] But word with ch followed by non-consonant pronounced [ch] E.g., “choice” More modern approach: learn HMMs / CRFsProsody Prosody: Getting from words+phones to boundaries, accent, F0, duration Prosodic phrasing  Need to break utterances into phrases Punctuation is useful, not sufficient Accents: Predictions of accents: which syllables should be accented Realization of F0 contour: given accents/tones, generate F0 contour Duration: Predicting duration of each phoneThree aspects of prosody Prominence: some syllables/words are more prominent than others Structure/boundaries: sentences have prosodic structure Some words group naturally together Others have a noticeable break or disjuncture between them Tune: the intonational melody of an utterance.From Ladd (1996)Prominence: Pitch AccentsA: What types of foods are a good source of vitamins?B1: Legumes are a good source of VITAMINS.B2: LEGUMES are a good source of vitamins.• Prominent syllables are:• Louder• Longer• Have higher F0 and/or sharper changes in F0 (higher F0 velocity)Slide from Jennifer Vendittilegumes are a good source of VITAMINS50100150200250300350400Graphic representation of F0timeF0 (in Hertz)Slide from Jennifer Venditti3legumes are a good source of VITAMINS[ t ][ s ] [ s ]50100150200250300350400The ‘ripples’F0 is not defined for consonants without vocalfold vibration.Slide from Jennifer Vendittilegumes are a good source of VITAMINS[ v ][ g ] [ g ][ z ]50100150200250300350400The ‘ripples’... and F0 can be perturbed by consonants withan extreme constriction in the vocal tract.Slide from Jennifer Vendittilegumes are a good source of VITAMINS50100150200250300350400Abstraction of the F0 contourOur perception of the intonation contour abstracts away from these perturbations.Slide from Jennifer Vendittilegumes are a good source of VITAMINS50100150200250300350400The ‘waves’ and the ‘swells’ ‘wave’ = accent‘swell’ = phraseSlide from Jennifer VendittiStress vs. Accent Stress is a structural property of a word — it marks a potential (arbitrary) location for an accent to occur, if there is one. Accent is a property of a word in context — it is a way to mark intonational prominence in order to ‘highlight’ important words in the discourse.syllablesfull vowelsstressed syll(accented syll)lixforxxx(x)niaCaminstavixxxxxxxx(x)Slide from Jennifer VendittiWhich Word is Accented? It depends on the context. For example, the ‘new’ information in the answer to a question is often accented, while the ‘old’ information usually is not. Q1: What types of foods are a good source of vitamins? A1: LEGUMES are a good source of vitamins. Q2: Are legumes a source of vitamins? A2: Legumes are a GOOD source of vitamins. Q3: I’ve heard that legumes are healthy, but what are they a good source of ? A3: Legumes are a good source of VITAMINS.Slide from Jennifer Venditti450100150200250300350400Same ‘tune’, different alignmentLEGUMES are a good source of vitaminsThe main rise-fall accent (= “I assert this”) shifts locations.Slide from Jennifer Venditti50100150200250300350400Same ‘tune’, different alignmentLegumes are a GOOD source of vitaminsThe main rise-fall accent (= “I assert this”) shifts locations.Slide from Jennifer VendittiSame ‘tune’, different alignmentlegumes are a good source of VITAMINS50100150200250300350400The main rise-fall accent (= “I assert this”) shifts locations.Slide from Jennifer VendittiBroad focuslegumes are a good source of vitamins“Tell me something about the world.”In the absence of narrow focus, English tends to mark the firstand last ‘content’ words with perceptually prominent accents.50100150200250300350400Slide from Jennifer VendittiYes-No question tuneare LEGUMES a good source of vitaminsRise from the main accent to the end of the sentence.50100150200250300350400450500550Slide from Jennifer VendittiYes-No question tuneare legumes a GOOD source of vitaminsRise from the main accent to the end of the sentence.50100150200250300350400450500550Slide from


View Full Document

Berkeley COMPSCI 294 - Lecture Notes

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?