Automatic Dialect/Accent RecognitionFadi BiadsyApril 12th, 20101PhD Proposal – Fadi BiadsyOutline Problem Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition Phonotactic Modeling Prosodic Modeling Acoustic Modeling Discriminative Phonotactics 2PhD Proposal – Fadi BiadsyProblem: Dialect Recognition Given a speech segment of a predetermined language Great deal of work on language recognition Dialect and Accent recognition have more recently begun to receive attention Dialect recognition more difficult problem than language recognition3Dialect = {D1, D2,…,DN}PhD Proposal – Fadi BiadsyMotivation: Why Study Dialect Recognition? Discover differences between dialects To improve Automatic Speech Recognition (ASR) Model adaptation: Pronunciation, Acoustic, Morphological, Language models To infer speaker’s regional origin for Speech to speech translation Annotations for Broadcast News Monitoring Spoken dialogue systems – adapt TTS systems Charismatic speech Call centers – crucial in emergency situations4PhD Proposal – Fadi BiadsyMotivation: Cues that May Distinguish Dialects/Accents Phonetic cues: Differences in phonemic inventory Phonemic differences Allophonic differences (context-dependent phones) Phonotactics: Rules/Distribution that govern phonemes and their sequences in a dialect5(Al-Tamimi & Ferragne, 2005)Example: /r/Approximant in American English [ɹ] – modifies preceding vowelsTrilled in Scottish English in [Consonant] – /r/ – [Vowel] and in other contexts MSA: /s/ /a/ /t/ /u/ /q/ /A/ /b/ /i/ /l/ /u/ /h/ /u/ Egy: /H/ /a/ /t/ /?/ /a/ /b/ /l/ /u/Lev: /r/ /a/ /H/ /t/ /g/ /A/ /b/ /l/ /u/Differences in MorphologyDifferences in phonetic inventory and vowel usage“She will meet him”PhD Proposal – Fadi BiadsyMotivation: Cues that May Distinguish Dialects/Accents Prosodic differences Intonational patterns Timing and rhythm Spectral distribution (Acoustic frame-based features) Morphological, lexical, and syntactic differences 6Subjects rely on intonational cues to distinguish two German dialects (Hamburg urban dialects vs. Northern Standard German) (Peters et al., 2002)PhD Proposal – Fadi BiadsyOutline Problem Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition Phonotactic Modeling Prosodic Modeling Acoustic Modeling Discriminative Phonotactics Contributions Future Work Research Plan7PhD Proposal – Fadi BiadsyCase Study: Arabic Dialects Iraqi Arabic: Baghdadi, Northern, and Southern Gulf Arabic: Omani, UAE, and Saudi Arabic Levantine Arabic: Jordanian, Lebanese, Palestinian, and Syrian Arabic Egyptian Arabic: primarily Cairene Arabic8PhD Proposal – Fadi BiadsyCorpora – Four Dialects – DATA I Recordings of spontaneous telephone conversation produced by native speakers of the four dialects available from LDCDialect# SpeakersTotal DurationTestSpeakersCorpusGulf96541h150Gulf Arabic conversational telephone Speech database(Appen Pty Ltd, 2006a)Iraqi 47526h150Iraqi Arabic conversational telephone Speech database(Appen Pty Ltd, 2006b)Egyptian 39876h150CallHome Egyptian and its Supplement (Canavan et al., 1997) CallFriend Egyptian (Canavan and Zipperlen,1996)Levantine125879h150 Arabic CTS Levantine Fisher Training Data Set 1-3 (Maamouri, 2006)9PhD Proposal – Fadi BiadsyOutline Problem Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition Phonotactic Modeling Prosodic Modeling Acoustic Modeling Discriminative Phonotactics Contributions Future Work Research Plan10PhD Proposal – Fadi BiadsyProbabilistic Framework for Language ID11 Task: Hazen and Zue’s (1993) contribution:Acoustic modelProsodic modelPhonotacticPriorPhD Proposal – Fadi BiadsyOutline Problem Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition Phonotactic Modeling Prosodic Modeling Acoustic Modeling Discriminative Phonotactics Contributions Future Work Research Plan12PhD Proposal – Fadi BiadsyPhonotactic Approach13dh uw z hh ih n d uw ey...f uw v ow z l iy g s m k dh...h iy jh sh p eh ae ey p sh…Train an n-gram model: λiRun a phone recognizer Hypothesis: Dialects differ in their phonotactic distribution Early work: Phone Recognition followed by Language Modeling (PRLM) (Zissman, 1996) Training: For each dialect Di:PhD Proposal – Fadi Biadsyuw hh ih n d uw w ay eyuh jh y eh k oh v hh ...CTest utterance:Run the phone recognizerPhonotactic Approach – Identification14PhD Proposal – Fadi BiadsyApplying Parallel PRLM (Zissman, 1996) Use multiple (k) phone recognizers trained on multiple languages to train k n-gram phonotactic models for each language of interest Experiments on our data: 9 phone recognizers, trigram models 15PerplexitiesEnglish phonesArabic phonesAcoustic PreprocessingArabic Phone Recognizer English Phone Recognizer Japanese Phone Recognizer Iraqi LMGulf LMEgyptian LMLevantine LMMSA LMJapanese phonesBack-End Classifier Hypothesized DialectIraqi LMGulf LMEgyptian LMLevantine LMMSA LMIraqi LMGulf LMEgyptian LMLevantine LMMSA LMPhD Proposal – Fadi BiadsyOur Parallel PRLM Results – 10-Fold Cross Validation 16Test utterance duration in secondsPhD Proposal – Fadi BiadsyOutline Problem Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition Phonotactic Modeling Prosodic Modeling Acoustic Modeling Discriminative Phonotactics Contributions Future Work Research Plan17PhD Proposal – Fadi BiadsyProsodic Differences Across Dialects 18 Hypothesis: Dialects differ in their prosodic structure What are these differences? Global Features Pitch: Range and Register, Peak Alignment, STDV Intensity Rhythmic features: ∆C, ∆V, %V (using pseudo syllables) Speaking Rate Vowel duration statistics Compare dialects using descriptive statisticsPhD Proposal – Fadi BiadsyNew Approach: Prosodic Modeling19 Learn a sequential model for each prosodic sequence type using an ergodic continuous HMM for each dialect Pseudo-syllabification Sequential local features at the level of pseudo-syllables:PhD Proposal – Fadi
View Full Document