DOC PREVIEW
Columbia COMS W4706 - ASR Evaluation

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ASR EvaluationJulia HirschbergCS 4706Outline• Intrinsic Methods– Transcription Accuracy• Word Error Rate• Automatic methods, toolkits• Limitations– Concept Accuracy• Limitations• Extrinsic MethodsEvaluation• How to evaluate the ‘goodness’ of a word string output by a speech recognizer?• Terms:–3/26/2011 3Speech and Language Processing Jurafsky and MartinEvaluation• How to evaluate the ‘goodness’ of a word string output by a speech recognizer?• Terms:– ASR hypothesis: ASR output– Reference transcription: ground truth – what was actually saidTranscription Accuracy• Word Error Rate (WER)– Minimum Edit Distance: Distance in words between the ASR hypothesis and the reference transcription• Edit Distance: = (Substitutions+Insertions+Deletions)/N• For ASR, usually all weighted equally but different weights can be used to minimize difference types of errors– WER = Edit Distance * 100WER Calculation• Word Error Rate = 100 (Insertions+Substitutions + Deletions)------------------------------Total Word in Correct TranscriptAlignment example:REF: portable **** PHONE UPSTAIRS last night soHYP: portable FORM OF STORES last night soEval I S SWER = 100 (1+2+0)/6 = 50%3/26/2011 6Speech and Language Processing Jurafsky and Martin• Word Error Rate = 100 (Insertions+Substitutions + Deletions)------------------------------Total Word in Correct TranscriptAlignment example:REF: portable **** phone upstairs last night so ***HYP: preferable form of stores next light so farEval S I S S S S IWER = 100 (1+5+1)/6 = 117%NIST sctk-1.3 scoring softare:Computing WER with sclite• http://www.nist.gov/speech/tools/• Sclite aligns a hypothesized text (HYP) (from the recognizer) with a correct or reference text (REF) (human transcribed)id: (2347-b-013)Scores: (# C #S # D #I) 9 3 1 2REF: was an engineer SO I i was always with **** **** M E N U M and theyHYP: was an engineer ** A N D i was always with TH E M T H E Y ALL THAT and theyEval: D S I I S S3/26/2011 8Speech and Language Processing Jurafsky and MartinSclite output for error analysisCONFUSION PAIRS Total (972)With >= 1 occurances (972)1: 6 -> (%hesitation) ==> on2: 6 -> the == > that3: 5 -> but == > that4: 4 -> a ==> the5: 4 -> four ==> for6: 4 -> in ==> and7: 4 -> there ==> that8: 3 -> (%hesitation) ==> and9: 3 -> (%hesitation) ==> the10: 3 -> (a-) == > i11: 3 -> and == > i12: 3 -> and == > in13: 3 -> are == > there14: 3 -> as ==> is15: 3 -> have ==> that16: 3 -> is ==> this3/26/2011 9Speech and Language Processing Jurafsky and MartinSclite output for error analysis17: 3 -> it ==> that18: 3 -> mouse ==> most19: 3 -> was == > is20: 3 -> was == > this21: 3 -> you ==> we22: 2 -> (%hesitation) ==> it23: 2 -> (%hesitation) ==> that24: 2 -> (%hesitation) ==> to25: 2 -> (%hesitation) ==> yeah26: 2 -> a ==> all27: 2 -> a ==> kno w28: 2 -> a ==> you29: 2 -> along == > w ell30: 2 -> and == > it31: 2 -> and == > we32: 2 -> and == > you33: 2 -> are == > i34: 2 -> are == > were3/26/2011 10Speech and Language Processing Jurafsky and MartinOther Types of Error Analysis• What speakers are most often misrecognized (Doddington ’98)– Sheep: speakers who are easily recognized– Goats: speakers who are really hard to recognize– Lambs: speakers who are easily impersonated–Wolves: speakers who are good at impersonating others• What (context-dependent) phones are least well recognized?– Can we predict this?• What words are most confusable (confusability matrix)?– Can we predict this?Are there better metrics than WER?• WER useful to compute transcription accuracy• But should we be more concerned with meaning (“semantic error rate”)?– Good idea, but hard to agree on approach– Applied mostly in spoken dialogue systems, where semantics desired is clear– What ASR applications will be different?• Speech-to-speech translation?• Medical dictation systems?3/26/2011 13Speech and Language Processing Jurafsky and MartinConcept Accuracy• Spoken Dialogue Systems often based on recognition of Domain Concepts• Input: I want to go to Boston from Baltimore on September 29.• Goal: Maximize concept accuracy (total number of domain concepts in reference transcription of user input)Concept ValueSource CityBaltimoreTarget CityBostonTravel DateSept. 29– CA Score: How many domain concepts were correctly recognized of total N mentioned in reference transcriptionReference: I want to go from Boston to Baltimore on September 29Hypothesis: Go from Boston to Baltimore on December 29• 2 concepts correctly recognized/3 concepts in ref transcription * 100 = 66% Concept Accuracy– What is the WER?• 3 Ins+2 Subst+0Del/11 * 100 = 45% WER (55% Word Accuracy)Sentence Error Rate• Percentage of sentences with at least one error– Transcription error– Concept errorWhich Metric is Better?• Transcription accuracy?• Semantic accuracy?Next Class• Human speech


View Full Document

Columbia COMS W4706 - ASR Evaluation

Download ASR Evaluation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ASR Evaluation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ASR Evaluation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?