Unformatted text preview:

12/4/08 1 Issues for Processing Speech Mary HarperSpontaneous Speech Challenges Language Processing Approaches so we need but how do we get them out I say we have we set a string of charges that will root them out the back so t- the charges start at the front and just explode and blow a little something up but are really really loud and and marsupials have really good ears so that’ll be real that’ll really frighten them12/4/08 3 Speech Recognition12/4/08 4 The Challenges of Spontaneous Speech for Language Processing  Difficult for the recognizer  ASR errors (insertions, deletions, and substitutions)  Phenomena atypical of textual sources (e.g., filled pauses, speech repairs)  Acoustic challenges (fragments, filled pauses, coarticulation)  Language models do not currently model disfluencies adequately  Recognition output is difficult for humans to read  Recognition output is difficult for NLP  Sentence boundaries are NOT provided and ASR segments are often inappropriate  Utterances are different (planned on the fly) from written text  Much of spoken language is used for organizing the communication (e.g., “And so”).  Speech repairs are challenging.12/4/08 5 Assorted Spontaneous Speech Phenomena  Filled pauses: I think it's uh refreshing to see the uh support . . .  Parentheticals: but you know I was reading the other day . . .  Speech repairs: why didn't he why didn't she stay at home  Partial Words: cut between these t- these trees  “Ungrammatical" constructions: my friends is visiting meEnrich Word Stream with Structural Metadata  [so we need] * but how do we get them out /?  <I say> [we have] * we set a string of charges that will root them out the back /.  <so> [t-] * the charges start at the front and just explode and blow a little something up but are really really loud /.  [and] * and marsupials have really good ears /.  <so> [that’ll be real] * that’ll really frighten them /.Rich Transcript Words, times, confidences Speakers, boundaries, disfluencies, … Metadata Extraction (MDE) Reduce STT errors, Clean up & enrich output Speech-to-Text (STT) Essential core capability Metadata Extraction and TranscriptionStructural Metadata Extraction Tasks  Sentence Unit (SU) detection: find the sentence-like units and their subtypes  Filler word detection: filled pauses, discourse markers (e.g., <you know>), explicit editing terms  Interruption point (IP) detection (e.g., we have * we set a string of charges)  Edit word detection: reparandum region of a speech repair (e.g., [ we have ] * we set a string of charges)12/4/08 9 Motivation for Rich Transcriptions  Adding additional information to a transcription should:  Aid downstream language processing (provide sentence boundaries, indicate structure of disfluencies)  Improve readability to humans (adding punctuation, removing disfluencies) [e.g., MITLL readability experiments]  Improve ASR performance (e.g., feedback metadata information to recognizer to aid language models) [e.g., Work by Sebastien Coquoz, visiting ICSI from EPFL ]12/4/08 10 Feedback Structural Information to ASR  Motivation:  Linguistic segments are more appropriate for LMs than acoustically segmented (speech vs. non-speech) chunks  Error analysis of BN recognition reveals a higher error rate at ASR segment boundaries  Approach: use automatically-detected sentence boundaries to re-segment speech and then re-recognize  Results on BN corpus RT-03 eval set:  Recognizer automatic ASR segments: 14.0% WER  Use reference boundary information: 13.0%  Using system boundary information: 13.3% so far  This segmentation helps STT! [ Work by Sebastien Coquoz, visiting ICSI from EPFL ]12/4/08 11 The Challenge of Parsing Speech  There is a mismatch between ASR systems and statistical parsers:  Segments processed by an ASR system do not typically correspond to segments that statistical parsers normally work with.  ASR systems:  Produce long word strings without punctuation,  Word strings often contain errors (insertions, deletions, and substitutions),  Word strings contain phenomena that do not typically occur in textual sources (e.g., filled pauses, speech repairs).  Traditional parsers are text-based:  Don’t use acoustic cues,  Process sentences not segments,  Process input without word errors,  Process textual input without spontaneous speech phenomena.How to Enable Effective Downstream Processing of Speech  Metadata extraction  Providing sentence boundaries and disfluency annotations  Challenging: speech is difficult  Parsing  Structure enables other downstream processing  Challenging: parsing has been traditionally text-centered  Need to deal with speech related phenomena  Performance metrics exist for parsing text that need to be adapted to speechData Resources  The RT’04 conversational telephone speech data, annotated with structural metadata, was used in the RT’04 MDE benchmark tests.  Gold standard parses from the LDC treebanking team for dev, dev2, and eval sets.  Recognition output from state-of-the-art recognizers for the EARS RT’04 data.  Using this new data allowed us to evaluate the synergy between parsing and MDE system performance. conversations # SUs # words dev 72 11K 71K dev2 36 5K 35K eval 36 5K 34KResource JHU Speech Parsing Corpus (LDC2005E15): A unified conversational speech resource with consistent metadata markups and parse trees  Metadata markup  Sentence boundaries  Speech disfluencies  Treebank parse trees  Syntactic structure  RestartsAn Example <FL_ST> well <FL_END> <EDIT_ST> i- <EDIT_END> i know , it’s cold outside there now , huh (S1 (S (INTJ (UH well)) (EDITED (S (NP (PRP i-))) (DISFL-IP +)) (NP (PRP i)) (VP (VBP know) (, ,) (SBAR (S (NP (PRP it)) (VP (BES 's) (ADJP (JJ cold)) (ADVP (RB outside)) (ADVP (RB there) (RB now))) (, ,) (INTJ (UH huh)))) (. ?)))An Example <FL_ST> well <FL_END> <EDIT_ST> i-


View Full Document

UMD CMSC 723 - Issues for Processing Speech

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Download Issues for Processing Speech
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Issues for Processing Speech and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Issues for Processing Speech 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?