DOC PREVIEW
CMU CS 15492 - Speech Recognition Language Modeling

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Speech Processing 15-492/18-492Speech RecognitionLanguage ModelingBut not just acoustics• But not all phones are equi-probable• Find word sequences that maximizes• Using Bayes’ Law• Combine models– Us HMMs to provide– Use language model to provideLanguage PredictionsWhat are the most likely words?What are the most likely words?“the” more common than “loom”“the” more common than “loom”Different domains, different distributionsDifferent domains, different distributionsBus, timetable, 4:15, lateBus, timetable, 4:15, lateLCD, storage card, LCD, storage card, usbusbContext helps predictionContext helps predictionCarnegie …Carnegie …President …President …As quiet as a …As quiet as a …Markov ModelingLook at nLook at n--gram modelsgram modelsUnigram: Unigram: W_fW_fBigramBigram{W_1 | W_n{W_1 | W_n--1}1}Trigram {W_1 | W_nTrigram {W_1 | W_n--1, W_n1, W_n--3}3}NN--gram {W_1 | W_ngram {W_1 | W_n--1, … }1, … }But need lots of data to trainBut need lots of data to trainWhat is the word distributionWall Street Journal (1995)Wall Street Journal (1995)Total 22.5M word tokensTotal 22.5M word tokensTotal 508K different word typesTotal 508K different word types15K types appear more than 100 times15K types appear more than 100 times45% types appear only once.45% types appear only once.Top: the, of, to, a, in, and, that, for, is, onTop: the, of, to, a, in, and, that, for, is, onsaid(16), Mr(17), million(24), company(39)said(16), Mr(17), million(24), company(39)New tokens per dayNeed lots of data to trainAs we increase the NAs we increase the N--gram gram We need much more dataWe need much more dataVocabulary of 50K words 125T trigramsVocabulary of 50K words 125T trigramsAt least 40T words (if At least 40T words (if equiequi--probable)probable)About 5000 years of WSJAbout 5000 years of WSJSimplifying AssumptionsLimit vocabularyLimit vocabulary< 64K< 64KMake them all UPPER CASEMake them all UPPER CASERemove punctuationRemove punctuationPeople don’t say punctuationPeople don’t say punctuationMaybe make into phrases at punctuationMaybe make into phrases at punctuationHave a “unknown word” tokenHave a “unknown word” tokenReplace all low frequency words with UNKReplace all low frequency words with UNKCollapse similar wordsCollapse similar wordsAll numbers to NUMAll numbers to NUMCall Cities to CITY ….Call Cities to CITY ….Still not enough dataBackoffBackoff::If no trigram data use If no trigram data use bigrambigramdatadataIf no If no bigrambigramdata use unigramdata use unigramSmoothing:Smoothing:Assume there is at least 1 Assume there is at least 1 occurencesoccurencesAllow nonAllow non--integer frequenciesinteger frequencies“Good“Good--Turing” smoothingTuring” smoothingIf (Numof(nIf (Numof(n--1gram) < threshold)1gram) < threshold)F(ngramF(ngram) = Numof(n) = Numof(n--1gram)*P(n1gram)*P(n--1gram)1gram)How good is a modelYou build language model You build language model How good is it:How good is it:Test it in the ASR (takes time)Test it in the ASR (takes time)Have abstract measureHave abstract measureEntropy and Perplexity• Entropy– Related to predictability– Q is number of words– N is order of ngram• For sufficiently large Q• PerplexityPerplexityLarger number, harder problemLarger number, harder problemSort of a average branching factorSort of a average branching factorIf 20, about 20 choices per wordIf 20, about 20 choices per wordIf 300, about 300 choices per wordIf 300, about 300 choices per word20 is typically an “easy” task20 is typically an “easy” task300 is typically an “hard” task300 is typically an “hard” taskSometimes its only sometimes hardSometimes its only sometimes hardI want to go to X.I want to go to X.Lower perplexity measures give better recognitionLower perplexity measures give better recognitionNot true, but there is a correlationNot true, but there is a correlationBut surely we can do betterJust using the last two words?Just using the last two words?Syntax, semantics …Syntax, semantics …Writing grammars is hard Writing grammars is hard Beyond simple tasksBeyond simple tasksTraining grammars is even harderTraining grammars is even harderSemantics is even harder than thatSemantics is even harder than thatSome LM improvementsLooking at more than previous two wordsLooking at more than previous two wordsReplace words with typesReplace words with typesI want to go from City to CityI want to go from City to CityTriggerTrigger--based modelsbased modelsIf you see a word you’ll likely see related onesIf you see a word you’ll likely see related ones“president” triggers “vice“president” triggers “vice--president”president”Model CombinationUse background modelUse background modelGeneral (for domain)General (for domain)Use specific model to adaptUse specific model to adaptCombination byCombination bySimple linear weightsSimple linear weightsMaximum EntropyMaximum EntropyCARTCARTContext dependent modelsSwitch LM in dialog systemSwitch LM in dialog systemBuild separate models from different statesBuild separate models from different statesState1: Where do you want to go to?State1: Where do you want to go to?State2: When do you want to leave?State2: When do you want to leave?State3: When do you want to arrive?State3: When do you want to arrive?What about OOVs?OOV “out of vocabulary”OOV “out of vocabulary”Words not in the lexiconWords not in the lexiconIgnore themIgnore themThey might be irrelevantThey might be irrelevantTry to recognize themTry to recognize themThe might be namesThe might be namesAvoid themAvoid themDesign your system so there aren’t any important Design your system so there aren’t any important onesonesSummaryLanguage ModelsLanguage ModelsBayesBayesequationequationNN--gramsgramsSmoothing, Smoothing, backoffbackoff, adaptation,


View Full Document
Download Speech Recognition Language Modeling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Speech Recognition Language Modeling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Speech Recognition Language Modeling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?