Unformatted text preview:

Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation Justin Fackrell and Wojciech Skut Presented by Han The Problem The pronunciation of out of vocabulary OOV words is a major problem in TTS Many OOV words are names For English names the orthography for names is highly irregular Current methods of approaching this problem has low accuracy Using hand written or automatically learned rules to replace a sequence of graphemes by a sequence of phonemes The Challenge Their Method Scope English surnames forenames street names and place names Based on the observation that some of the words in the above categories have same pronunciation but slightly different spelling Approach learn from existing data datadriven of the rules of these variations so that next time we see an OOV word we will try to apply these rules and see if we can transform that word into an IOV word Different Orthographical Expressions for the Same Pronunciation Hypothesis Given a name that s not in the dictionary there s about 10 chance that it DOES have a valid pronunciation in the dictionary We have to somehow map it to a valid in dictionary word A Hierarchical Approach Dictionary Filter 1 Filter 2 etc Two Ways of Using This Method and Their Results Online Results suggested pronunciations are good in 80 of cases Offline For surnames a model trained on a 23 000entry dictionary was able to add 5 000 new entries increasing the coverage by about 1 The Algorithm Part I Training 1 reverse dictionary pron ortho 2 delete one to one mappings 3 Each pair of spellings that share a common pronunciation generates a set of rewrite rules ri where i 0 to n in the form of A B L R The Algorithm Part I Training The Algorithm Part I Training Each rule ri is then evaluated on the rest of the dictionary to see how useful it is MISS OOV DIFF GOOD And gets four scores niMISS niOOV niDIFF and niGOOD From each set of rules generated by a pair only one rule is chosen shortest and niDIFF 0 The Algorithm Part I Predication Sort all rules by score When given an OOV word use the rule with the highest score that can map it into an IOV word Some Examples of Resulted Rewrite Rules Some Results Accuracy Test Results Accuracy Test Results


View Full Document

Columbia CS 4706 - Improving Pronunciation

Loading Unlocking...
Login

Join to view Improving Pronunciation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Improving Pronunciation and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?