Columbia COMS W4706 - Word Pronunciation - D222350

Home> Schools> Columbia University> (COMS) > COMS W4706> Word Pronunciation

DOC PREVIEW

Columbia COMS W4706 - Word Pronunciation

School name Columbia University

Course Coms W4706- Spoken Language Processing

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

2/17/2010 1Word PronunciationJulia HirschbergCS 47062/17/2010 2Today• Motivation• Challenges for automatic word pronunciation• Standard methods• Innovative solutions2/17/2010 32/17/2010 3• TTS demos:– ScanSoft/Nuance – AT&T – IBM– Cepstral • SNL Robot Repair2/17/2010 4Motivation• Intelligibility• Naturalness• Applications to language learning– Unlimited vocabulary– Type a word or phrase and hear it spoken in your target language• To imitate• To learn to recognize• Speech therapy2/17/2010 5Word Pronunciation• What determines how a word is pronounced?– History/Language Origin/Dictionaries: • shoe (ME shoo), phoenix (Gr)• mole, attaches, resume– Part-of-speech:• use, close, dove, multiply, coax– Morphology:• ferryboat, ferryboats• Popemobile (pope+mobile)2/17/2010 6Letter-to-Sound Rules• Define correspondences between orthography and phonemic representation, e.g.– i _{C}e$  /ai/ rise– Else i  /ih/ rip• Deals with any input2/17/2010 7Problems• Must be built by hand• Many exceptions, e.g.• i _{C}e$  /ai/ matches ripen/risen/riser/river/ripper• Proper names: Nice, Ramirez, Ribeiro, Rise, Infiniti• Symbols and abbreviations: &c, evalu8, cu, tsp• Assigning lexical stress• Solutions– More complex rules– Exceptions dictionary• Consulted first• But how do we handle morphological variation? E.g.– Rise’s hat2/17/2010 8Dictionary-based Approaches• Rely on very large dictionary with orthography and pronunciation for each word• Typically created by hand or by expansion of online pronouncing dictionary2/17/2010 9Problems• Redundancy of representation– Cat, cats, cat’s, cats’• Out-of-vocabulary (OOV) items– Proper names: covering all U.K. surnames would require >5,000,000 entries– New words: …• Technical terms: liposuction, anova, bernaise• Foreign borrowings: frappe, ciao, louche2/17/2010 10• Solutions– Larger dictionary– Morphological preprocessing before dictionary look-up– Fall back to L2Sound rules if no dictionary ‘hit’2/17/2010 11Major Challenges for TTS• Disambiguating homographs– bass/bass• Pronouncing new words– New names in the news: – New words: iPad, Kindle• Expanding abbreviations and acronyms correctly2/17/2010 12Homograph Disambiguation by Decision List Classifiers (Yarowsky ‘97)• E.g., bass/bass, nice/Nice, live/live, desert/desert, lead/lead• Rank byvjfiSensePvjfiSensePLogAbs|2(|1((2/17/2010 13Pronouncing OOV Words• Techniques for handling OOVs– Inferring country of origin: • Takashita, Leroy, Kirov, Lima, Infiniti– Pronunciation by analogy• Analog/dialog• Risible/visible• Proper names: Alifano/Califano2/17/2010 14Bootstrapping Phonetic Lexicons (Maskey et al ’04)• For some languages, online pronouncing lexicons exist – but for others….e.g. Nepali– How to minimize effort in creating lexicons?• Approach– Given a native speaker and a large amount of online text in the language…• Native speaker builds small lexicon by hand for seed set of N most common words in text, e.g.– is: /izh/– the: /dhax/2/17/2010 15• Derive L2S rules from lexicon automatically, e.g.– is  ih{zh}– the  {dh}ax …• Loop: Choose the next N most common set of words from the text and use the lexicon + L2S rules to predict pronunciations, e.g.– telephone -> /telaxfown/– He -> /hax/?– Rise -> /rihzhax/?• Assign a confidence score to each prediction by comparing each word to all words in lexicon– If is -> /ihzh} in lexicon and no other orthographically similar words are pronounced differently, new rule his -> /hihzh/ scores high2/17/2010 16• For low confidence pronunciations, Active Learning step: – Inspect and calculate error rate– Hand correct errors and add all to lexicon– Iterate from Loop until performance stabilizes• Build a new set of L2S rules from augmented lexicon• Results– English: • 94% success on test set after 23 iterations, 16K entry lexicon• Performance comparable to CMUDict and 1/7 the size– German: • 90% accuracy after 13 iterations, 28K lexicon– Nepali• 94.6% accuracy after 16 iterations, 5K lexicon2/17/2010 17Improving Pronunciation Dictionary Coverage (Fackrell and Skut ’04)• Idea: Many proper names have more than one spelling (e.g. More/Moore; Smith/Smythe)– Homophones– Find a ‘fuzzy’ mapping between OOV (Out of Vocabulary) words and words already in the lexicon – Identify spelling alternations that are ‘pronunciation-neutral’ in an existing lexicon to produce rewrite rules for OOVs2/17/2010 18• Pros?• Cons?2/17/2010 19Deriving Pronunciations from the Web (Ghoshal et al ’09)• Extract candidate orthography/pronunciation pairs (ad-hoc and IPA)– E.g. bruschetta (pronounced broo-SKET-uh)• Validate the candidates: how likely are these pairs to represent a word and its pronunciation• Normalize ad-hoc and IPA pronunciations2/17/2010 20• Pros?• Cons?2/17/2010 21Pronunciation Evaluation• How would you evaluate the pronunciation module of a TTS system?2/17/2010 22Next Class• Readings• Download the ToBI cardinal examples (see http://www1.cs.columbia.edu/~agus/tobi/)– You will first need to download WaveSurfer• http://www.speech.kth.se/wavesurfer/– Then download the cardinal examples• http://www1.cs.columbia.edu/~agus/tobi/cardinals/manual.php• Listen to each of the cardinal examples – Try to imitate each one and to decide what it

View Full Document