Columbia COMS W4706 - Automatic Dialect/Accent Recognition - D2842775

Home> Schools> Columbia University> (COMS) > COMS W4706> Automatic Dialect/Accent Recognition

DOC PREVIEW

Columbia COMS W4706 - Automatic Dialect/Accent Recognition

School name Columbia University

Course Coms W4706- Spoken Language Processing

Pages 50

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Automatic Dialect/Accent RecognitionFadi BiadsyApril 12th, 20101PhD Proposal – Fadi BiadsyOutline Problem  Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition  Phonotactic Modeling  Prosodic Modeling Acoustic Modeling Discriminative Phonotactics 2PhD Proposal – Fadi BiadsyProblem: Dialect Recognition Given a speech segment of a predetermined language Great deal of work on language recognition  Dialect and Accent recognition have more recently begun to receive attention  Dialect recognition more difficult problem than language recognition3Dialect = {D1, D2,…,DN}PhD Proposal – Fadi BiadsyMotivation: Why Study Dialect Recognition? Discover differences between dialects To improve Automatic Speech Recognition (ASR) Model adaptation: Pronunciation, Acoustic, Morphological, Language models To infer speaker’s regional origin for Speech to speech translation Annotations for Broadcast News Monitoring Spoken dialogue systems – adapt TTS systems Charismatic speech Call centers – crucial in emergency situations4PhD Proposal – Fadi BiadsyMotivation: Cues that May Distinguish Dialects/Accents  Phonetic cues: Differences in phonemic inventory Phonemic differences Allophonic differences (context-dependent phones)  Phonotactics: Rules/Distribution that govern phonemes and their sequences in a dialect5(Al-Tamimi & Ferragne, 2005)Example: /r/Approximant in American English [ɹ] – modifies preceding vowelsTrilled in Scottish English in [Consonant] – /r/ – [Vowel] and in other contexts MSA: /s/ /a/ /t/ /u/ /q/ /A/ /b/ /i/ /l/ /u/ /h/ /u/ Egy: /H/ /a/ /t/ /?/ /a/ /b/ /l/ /u/Lev: /r/ /a/ /H/ /t/ /g/ /A/ /b/ /l/ /u/Differences in MorphologyDifferences in phonetic inventory and vowel usage“She will meet him”PhD Proposal – Fadi BiadsyMotivation: Cues that May Distinguish Dialects/Accents Prosodic differences Intonational patterns  Timing and rhythm  Spectral distribution (Acoustic frame-based features)  Morphological, lexical, and syntactic differences 6Subjects rely on intonational cues to distinguish two German dialects (Hamburg urban dialects vs. Northern Standard German) (Peters et al., 2002)PhD Proposal – Fadi BiadsyOutline Problem  Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition  Phonotactic Modeling  Prosodic Modeling Acoustic Modeling Discriminative Phonotactics  Contributions  Future Work Research Plan7PhD Proposal – Fadi BiadsyCase Study: Arabic Dialects Iraqi Arabic: Baghdadi, Northern, and Southern Gulf Arabic: Omani, UAE, and Saudi Arabic Levantine Arabic: Jordanian, Lebanese, Palestinian, and Syrian Arabic Egyptian Arabic: primarily Cairene Arabic8PhD Proposal – Fadi BiadsyCorpora – Four Dialects – DATA I Recordings of spontaneous telephone conversation produced by native speakers of the four dialects available from LDCDialect# SpeakersTotal DurationTestSpeakersCorpusGulf96541h150Gulf Arabic conversational telephone Speech database(Appen Pty Ltd, 2006a)Iraqi 47526h150Iraqi Arabic conversational telephone Speech database(Appen Pty Ltd, 2006b)Egyptian 39876h150CallHome Egyptian and its Supplement (Canavan et al., 1997) CallFriend Egyptian (Canavan and Zipperlen,1996)Levantine125879h150 Arabic CTS Levantine Fisher Training Data Set 1-3 (Maamouri, 2006)9PhD Proposal – Fadi BiadsyOutline Problem  Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition  Phonotactic Modeling Prosodic Modeling Acoustic Modeling Discriminative Phonotactics  Contributions  Future Work Research Plan10PhD Proposal – Fadi BiadsyProbabilistic Framework for Language ID11 Task: Hazen and Zue’s (1993) contribution:Acoustic modelProsodic modelPhonotacticPriorPhD Proposal – Fadi BiadsyOutline Problem  Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition  Phonotactic Modeling  Prosodic Modeling Acoustic Modeling Discriminative Phonotactics  Contributions  Future Work Research Plan12PhD Proposal – Fadi BiadsyPhonotactic Approach13dh uw z hh ih n d uw ey...f uw v ow z l iy g s m k dh...h iy jh sh p eh ae ey p sh…Train an n-gram model: λiRun a phone recognizer Hypothesis: Dialects differ in their phonotactic distribution Early work: Phone Recognition followed by Language Modeling (PRLM) (Zissman, 1996) Training: For each dialect Di:PhD Proposal – Fadi Biadsyuw hh ih n d uw w ay eyuh jh y eh k oh v hh ...CTest utterance:Run the phone recognizerPhonotactic Approach – Identification14PhD Proposal – Fadi BiadsyApplying Parallel PRLM (Zissman, 1996)  Use multiple (k) phone recognizers trained on multiple languages to train k n-gram phonotactic models for each language of interest  Experiments on our data: 9 phone recognizers, trigram models 15PerplexitiesEnglish phonesArabic phonesAcoustic PreprocessingArabic Phone Recognizer English Phone Recognizer Japanese Phone Recognizer Iraqi LMGulf LMEgyptian LMLevantine LMMSA LMJapanese phonesBack-End Classifier Hypothesized DialectIraqi LMGulf LMEgyptian LMLevantine LMMSA LMIraqi LMGulf LMEgyptian LMLevantine LMMSA LMPhD Proposal – Fadi BiadsyOur Parallel PRLM Results – 10-Fold Cross Validation 16Test utterance duration in secondsPhD Proposal – Fadi BiadsyOutline Problem  Motivation Corpora Framework for Language Recognition Experiments in Dialect Recognition  Phonotactic Modeling  Prosodic Modeling Acoustic Modeling Discriminative Phonotactics  Contributions  Future Work Research Plan17PhD Proposal – Fadi BiadsyProsodic Differences Across Dialects 18 Hypothesis: Dialects differ in their prosodic structure What are these differences? Global Features Pitch: Range and Register, Peak Alignment, STDV Intensity  Rhythmic features: ∆C, ∆V, %V (using pseudo syllables) Speaking Rate Vowel duration statistics Compare dialects using descriptive statisticsPhD Proposal – Fadi BiadsyNew Approach: Prosodic Modeling19 Learn a sequential model for each prosodic sequence type using an ergodic continuous HMM for each dialect  Pseudo-syllabification  Sequential local features at the level of pseudo-syllables:PhD Proposal – Fadi

View Full Document