MIT 6 863J - N-grams - D2719665

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 863J> N-grams

DOC PREVIEW

MIT 6 863J - N-grams

School name Massachusetts Institute of Technology

Course 6 863j- Natural Language and the Computer Representation of Knowledge

Pages 42

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

DRAFTSpeech and Language Processing: An introduction to speech recognition, computationallinguistics and natural language processing. Daniel Jurafsky & James H. Martin.Copyrightc 2006, All rights reserved. Draft of January 7, 2007. Do not cite withoutpermission.4N-GRAMSBut it must be recognized that the notion “probability of a sen-tence” is an entirely useless one, under any known interpretationof this term.Noam Chomsky (1969, p. 57)Anytime a linguist leaves the group the recognition rate goes up.Fred Jelinek (then of the IBM speech group) (1988)1Radar O’Reilly, the mild-mannered clerk of the 4077thM*A*S*H unit, had an uncannyability to guess the next word someone was going to say. In this chapter we take upthis idea of word prediction; what word, for example, is likely to follow:I’d like to make a collect...Hopefully most of you concluded that a very likely word is call, or internationalor phone, but probably not the. We formalize this idea of word prediction with prob-WORD PREDICTIONabilistic models called N-grams, which predict the next word from the previous N −1words. Such statistical models of word sequences are also called language models orLANGUAGE MODELSLMs. Computing the probability of the next word will turn out to be closely relatedLMto computing the probability of a sequence of words. The following sequence, forexample, has a non-zero probability of appearing in a text:...all of a sudden I notice three guys standing on the sidewalk...while this same set of words in a different order has a very low probability:on guys all I of notice sidewalk three a sudden standing theAs we will see, estimators like N-grams that assign a conditional probability topossible next words can be used to assign a joint probability to an entire sentence.1This wording from his address is as recalled by Jelinek himself; the quote didn’t appear in the proceed-ings (Palmer and Finin, 1990). Some remember a more snappy version: Every time I fire a linguist theperformance of the recognizer improves.DRAFT2 Chapter 4. N-gramsWhether estimating probabilities of next words or of whole sequences, the N-grammodel is one of the most important tools in speech and language processing.N-grams are essential in any task in which we have to identify words in noisy,ambiguous input. In speech recognition, for example, the input speech sounds are veryconfusable and many words sound extremely similar. Russell and Norvig (1995) givean intuition from handwriting recognition for how probabilities of word sequencescan help. In the movie Take the Money and Run, Woody Allen tries to rob a bank witha sloppily written hold-up note that the teller incorrectly reads as “I have a gub”. Anyspeech and language processing system could avoid making this mistake by using theknowledge that the sequence “I have a gun” is far more probable than the non-word “Ihave a gub” or even “I have a gull”.N-gram models are also essential in statistical machine translation. Supposewe are translating a Chinese source sentence andas part of the process we have a set of potential rough English translations:he briefed to reporters on the chief contents of the statementhe briefed reporters on the chief contents of the statementhe briefed to reporters on the main contents of the statementhe briefed reporters on the main contents of the statementAn N-gram grammar might tell us that, even after controlling for length, briefedreporters is more likely than briefed to reporters, and main contents is more likely thanchief contents. This lets us select the bold-faced sentence above as the most fluenttranslation sentence, i.e. the one that has the highest probability.In spelling correction, we need to find and correct spelling errors like the fol-lowing (from Kukich (1992)) that accidentally result in real English words:They are leaving in about fifteen minuets to go to her house.The design an construction of the system will take more than a year.Since these errors have real words, we can’t find them by just flagging wordsnot in the dictionary. But note that in about fifteen minuets is a much less probablesequence than in about fifteen minutes. A spellchecker can use a probability estimatorboth to detect these errors and to suggest higher-probability correction.Word prediction is also important for augmentative communication (NewellAUGMENTATIVECOMMUNICATIONet al., 1998) systems that help the disabled. People who are unable to use speech orsign-language to communicate, like the physicist Steven Hawking, can communicateby using simple body movements to select words from a menu that are spoken by thesystem. Word prediction can be used to suggest likely words for the menu.Besides these sample areas, N-grams are also crucial in NLP foundations likepart-of-speech tagging, natural language generation, and word similarity, as wellas applications from authorship identification and sentiment extraction to predic-tive text input systems for cell phones.DRAFTSection 4.1. Counting Words in Corpora 34.1 COUNTING WORDS IN CORPORA[upon being asked if there weren’t enough words in the English language for him]:“Yes, there are enough, but they aren’t the right ones.”James Joyce, reported in Bates (1997)Probabilities are based on counting things. Before we talk about probabilities,we need to decide what we are going to count. Counting of things in natural language isbased on a corpus (plural corpora), an on-line collection of text or speech. Let’s lookCORPUSCORPORAat two popular corpora, Brown and Switchboard. The Brown Corpus is a 1 million wordcollection of samples from 500 written texts from different genres (newspaper, novels,non-fiction, academic, etc.), assembled at Brown University in 1963-64 (Kuˇcera andFrancis, 1967; Francis, 1979; Francis and Kuˇcera, 1982). How many words are in thefollowing Brown sentence?(4.1) He stepped out into the hall, was delighted to encounter a water brother.Example (4.1) has 13 words if we don’t count punctuation-marks as words, 15 ifwe count punctuation. Whether we treat period (“.”), comma (“,”), and so on as wordsdepends on the task. Punctuation is critical for finding boundaries of things (comma,periods, colons), and for identifying some aspects of meaning (question marks, excla-mation marks, quotation marks). For some tasks, such as part-of-speech tagging orparsing or sometimes speech synthesis, we thus sometimes treat punctuation as if theywere separate words.The Switchboard

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

MIT 6 863J - N-grams

Sign up for free to view:

Please select your school