2/17/2010 1Word PronunciationJulia HirschbergCS 47062/17/2010 2Today• Motivation• Challenges for automatic word pronunciation• Standard methods• Innovative solutions2/17/2010 32/17/2010 3• TTS demos:– ScanSoft/Nuance – AT&T – IBM– Cepstral • SNL Robot Repair2/17/2010 4Motivation• Intelligibility• Naturalness• Applications to language learning– Unlimited vocabulary– Type a word or phrase and hear it spoken in your target language• To imitate• To learn to recognize• Speech therapy2/17/2010 5Word Pronunciation• What determines how a word is pronounced?– History/Language Origin/Dictionaries: • shoe (ME shoo), phoenix (Gr)• mole, attaches, resume– Part-of-speech:• use, close, dove, multiply, coax– Morphology:• ferryboat, ferryboats• Popemobile (pope+mobile)2/17/2010 6Letter-to-Sound Rules• Define correspondences between orthography and phonemic representation, e.g.– i _{C}e$ /ai/ rise– Else i /ih/ rip• Deals with any input2/17/2010 7Problems• Must be built by hand• Many exceptions, e.g.• i _{C}e$ /ai/ matches ripen/risen/riser/river/ripper• Proper names: Nice, Ramirez, Ribeiro, Rise, Infiniti• Symbols and abbreviations: &c, evalu8, cu, tsp• Assigning lexical stress• Solutions– More complex rules– Exceptions dictionary• Consulted first• But how do we handle morphological variation? E.g.– Rise’s hat2/17/2010 8Dictionary-based Approaches• Rely on very large dictionary with orthography and pronunciation for each word• Typically created by hand or by expansion of online pronouncing dictionary2/17/2010 9Problems• Redundancy of representation– Cat, cats, cat’s, cats’• Out-of-vocabulary (OOV) items– Proper names: covering all U.K. surnames would require >5,000,000 entries– New words: …• Technical terms: liposuction, anova, bernaise• Foreign borrowings: frappe, ciao, louche2/17/2010 10• Solutions– Larger dictionary– Morphological preprocessing before dictionary look-up– Fall back to L2Sound rules if no dictionary ‘hit’2/17/2010 11Major Challenges for TTS• Disambiguating homographs– bass/bass• Pronouncing new words– New names in the news: – New words: iPad, Kindle• Expanding abbreviations and acronyms correctly2/17/2010 12Homograph Disambiguation by Decision List Classifiers (Yarowsky ‘97)• E.g., bass/bass, nice/Nice, live/live, desert/desert, lead/lead• Rank byvjfiSensePvjfiSensePLogAbs|2(|1((2/17/2010 13Pronouncing OOV Words• Techniques for handling OOVs– Inferring country of origin: • Takashita, Leroy, Kirov, Lima, Infiniti– Pronunciation by analogy• Analog/dialog• Risible/visible• Proper names: Alifano/Califano2/17/2010 14Bootstrapping Phonetic Lexicons (Maskey et al ’04)• For some languages, online pronouncing lexicons exist – but for others….e.g. Nepali– How to minimize effort in creating lexicons?• Approach– Given a native speaker and a large amount of online text in the language…• Native speaker builds small lexicon by hand for seed set of N most common words in text, e.g.– is: /izh/– the: /dhax/2/17/2010 15• Derive L2S rules from lexicon automatically, e.g.– is ih{zh}– the {dh}ax …• Loop: Choose the next N most common set of words from the text and use the lexicon + L2S rules to predict pronunciations, e.g.– telephone -> /telaxfown/– He -> /hax/?– Rise -> /rihzhax/?• Assign a confidence score to each prediction by comparing each word to all words in lexicon– If is -> /ihzh} in lexicon and no other orthographically similar words are pronounced differently, new rule his -> /hihzh/ scores high2/17/2010 16• For low confidence pronunciations, Active Learning step: – Inspect and calculate error rate– Hand correct errors and add all to lexicon– Iterate from Loop until performance stabilizes• Build a new set of L2S rules from augmented lexicon• Results– English: • 94% success on test set after 23 iterations, 16K entry lexicon• Performance comparable to CMUDict and 1/7 the size– German: • 90% accuracy after 13 iterations, 28K lexicon– Nepali• 94.6% accuracy after 16 iterations, 5K lexicon2/17/2010 17Improving Pronunciation Dictionary Coverage (Fackrell and Skut ’04)• Idea: Many proper names have more than one spelling (e.g. More/Moore; Smith/Smythe)– Homophones– Find a ‘fuzzy’ mapping between OOV (Out of Vocabulary) words and words already in the lexicon – Identify spelling alternations that are ‘pronunciation-neutral’ in an existing lexicon to produce rewrite rules for OOVs2/17/2010 18• Pros?• Cons?2/17/2010 19Deriving Pronunciations from the Web (Ghoshal et al ’09)• Extract candidate orthography/pronunciation pairs (ad-hoc and IPA)– E.g. bruschetta (pronounced broo-SKET-uh)• Validate the candidates: how likely are these pairs to represent a word and its pronunciation• Normalize ad-hoc and IPA pronunciations2/17/2010 20• Pros?• Cons?2/17/2010 21Pronunciation Evaluation• How would you evaluate the pronunciation module of a TTS system?2/17/2010 22Next Class• Readings• Download the ToBI cardinal examples (see http://www1.cs.columbia.edu/~agus/tobi/)– You will first need to download WaveSurfer• http://www.speech.kth.se/wavesurfer/– Then download the cardinal examples• http://www1.cs.columbia.edu/~agus/tobi/cardinals/manual.php• Listen to each of the cardinal examples – Try to imitate each one and to decide what it
View Full Document