Speech Processing 15-492/18-492MultilingualityDealing with *all* LanguagesOver 6000 LanguagesOver 6000 LanguagesMaybe not all commercially interesting … nowMaybe not all commercially interesting … nowMajor languages (economic)Major languages (economic)Cell phone manufacturers list 46 languagesCell phone manufacturers list 46 languagesBut even those not all coveredBut even those not all coveredWhat you needASRASRAcoustic model (lots of speakers)Acoustic model (lots of speakers)Pronunciation LexiconPronunciation LexiconLanguage modelLanguage modelTTSTTSAcoustic model (one speaker)Acoustic model (one speaker)Pronunciation LexiconPronunciation LexiconText analysisText analysisWriting SystemsRomanized writing systemsRomanized writing systemsLatinLatin--1 (iso1 (iso--85998599--1)1)Covers many Western Europeans languagesCovers many Western Europeans languagesCyrillic Cyrillic Covers many Eastern European LanguagesCovers many Eastern European LanguagesArabic ScriptsArabic ScriptsArabic(sArabic(s), Farsi, Urdu, etc), Farsi, Urdu, etcDevenagariDevenagariCovers many Northern India LanguagesCovers many Northern India LanguagesChinese Chinese HanziHanziCovers some Chinese dialects but different versionsCovers some Chinese dialects but different versionsMany other scripts some nonMany other scripts some non--standardstandardWriting SystemsLetter based Letter based Latin, CyrillicLatin, CyrillicConsonant basedConsonant basedArabic, HebrewArabic, HebrewMora basedMora basedHalf syllable or syllableHalf syllable or syllableIndian scripts, Japanese native scriptsIndian scripts, Japanese native scriptsSyllable based Syllable based Hangul, ChineseHangul, ChineseStandardsWriting standardsWriting standardsTaught at schools, newspapers, computer Taught at schools, newspapers, computer supportsupportTypically standardized spellingTypically standardized spellingMay be mostly spokenMay be mostly spokenOccasionally writtenOccasionally writtenLanguage Specific IssuesNo explicit markingsNo explicit markingsStress, accent, tonesStress, accent, tonesNo word boundariesNo word boundariesChinese, ThaiChinese, ThaiNo (short) vowelsNo (short) vowelsArabic, HebrewArabic, HebrewRich morphologyRich morphologyMany different words in the languagesMany different words in the languagesFinnish, Turkish, GreenlandicFinnish, Turkish, GreenlandicGenre Specific IssuesNo capitals, punctuationsNo capitals, punctuationsUnpunctuatedUnpunctuatedPlain Plain vsvspolite formpolite formSpeech Speech vsvstext formtext formMany foreign phrasesMany foreign phrases(technology directed genre’s)(technology directed genre’s)Many new abbreviationsMany new abbreviationsE.g. SMS messagesE.g. SMS messagesCharacter EncodingUnicode Unicode vsvsutf8 utf8 vsvslatinlatinDocuments mix themDocuments mix themSometime accent omittedSometime accent omittedFor ease of typingFor ease of typingLots of standardsLots of standardsUnicode, EUC, BIG5, TIS42, …Unicode, EUC, BIG5, TIS42, …Everyone has their own standardEveryone has their own standardSome create their own standardsSome create their own standardsMixed character setsMixed character setsPhoneme SetsHard to find consensus for new languagesHard to find consensus for new languagesTypically lots of different dialectsTypically lots of different dialectsWhat level of distinction?What level of distinction?Some good for speech but not really phoneticSome good for speech but not really phonetic/t/ /t/ vsvs//dxdx/ in “water”/ in “water”Often doesn’t include foreign phonesOften doesn’t include foreign phones/w/ in German is common for younger people/w/ in German is common for younger peopleWordsMay be hard to defineMay be hard to defineNo word boundariesNo word boundariesRich morphologyRich morphologyWords have many variations of compoundsWords have many variations of compoundsYomenakattaYomenakatta--> could not read> could not readYomemasendeshitaYomemasendeshita--> could not read (polite)> could not read (polite)Gender specific speechGender specific speechBokuBokuvsvsatashiatashiLanguage mixturesLanguage mixturesPronunciation lexicons““proper” speech proper” speech vsvs“actual” speech“actual” speechHard to generalizeHard to generalizeChineseChineseCross lingual pronunciationsCross lingual pronunciations“Human” (English/German)“Human” (English/German)“Industry” wayCollect at least 100 hours of spoken speechCollect at least 100 hours of spoken speechAt least 20 different speakersAt least 20 different speakersMixture of gender, age, etcMixture of gender, age, etcThrough desired channel (phone/desktop)Through desired channel (phone/desktop)Collect at least 5 hours from one speakerCollect at least 5 hours from one speakerHigh quality recording studioHigh quality recording studioData should be targeted to applicationData should be targeted to applicationBuild pronunciation lexiconBuild pronunciation lexiconExpert Expert phonologistphonologistIndustry wayProbably 3Probably 3--6 months6 monthsLead developerLead developerLocal language expertLocal language expertLots of human transcribersLots of human transcribersCosts?Costs?Many hundreds of thousandsMany hundreds of thousandsOr cheaper (?) …Find existing dataFind existing dataLinguistic Data Consortium (Linguistic Data Consortium (UPennUPenn))ELRA (European equivalent)ELRA (European equivalent)AppenAppen, Australia, AustraliaFind local people who have collected dataFind local people who have collected dataFound data might be in wrong formatFound data might be in wrong formatData cleaning is often the most expensiveData cleaning is often the most expensiveActual wayOften mixtureOften mixtureFound data for initial modelFound data for initial modelCollect data with actual/initial applicationCollect data with actual/initial applicationMultilingual
View Full Document