DOC PREVIEW
MIT 6 863J - Lecture 3: Language models

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 3: Language models – given thetext so far, what comes….Professor Robert C. [email protected]/9.611J SP11The Menu Bar• Administrivia• Where do parts of speech come from? A linear model and itscomputational and learning implications• Lab 2: What’s a language model? (in this restricted case)• Language models & n-grams: what is the task?• How to calculate?• What are the problems?• Never trust a sample under 30• Smoothing – the Dark Arts & the Robin Hood trick:the twokey questions for Robin Hood that determine the space ofpossible “smoothing methods” for n-grams6.863J/9.611J SP116.863J/9.611J SP11Simplest example of language‘modeling’: beads on a stringq0q1revolutionarynewq2q3ideas appearinfrequentlyq4qfIn general: we want to find, or better, estimatefrom the corpus, the probability distribution forstrings of tokens (letters, words, …)Question to keep in mind: when is ‘beads on astring model’ inappropriate???Example: unlockableThe Task• Find a probability distribution for sentences, p(w)which are strings of words• In particular, find probability for a word in a text(utterance, etc.), given what the last n words havebeen (n = 0,1,2,3,…)• Why this is reasonable• What the problems are: parameter estimation &conceptual issues6.863J/9.611J SP11Some issues• What patterns in language can this describe?• What patterns in language can’t this describe?Why?• Can we formalize this notion of ‘linear’ relations?(Yes)• Are these patterns easily learnable (Yes, withsome additional constraints)Where do ‘parts of speech’ come from?• An early answer:“Words are assigned to classes on the basis ofenvironments in which they occur. Eachenvironment determines one and only one class. Aword A belongs to the class determined by theenvironment ___X if AX is either an utterance oroccurs as part of some utterance” – Rulon Wells,1947An Alien language?1. Palin won the election2. Palin will win the election3. Palin would win the election4. Palin should win the election5. Palin did win the election6. Palin could have won the election7. Palin should have won the election8. Palin will have won the election9. Palin should have been winning theelection10. Palin will have been winning theelection11. Palin has won the election12. Palin has been winning the election13. Obama won the election14. Obama will win the election15. Obama would win the election16. Obama should win the election17. Obama did win the election18. Obama could have won theelection19. Obama should have won theelection20. Obama will have won the election21. Obama should have been winningthe election22. Obama will have been winning theelection23. Obama has won the election24. Obama has been winning theelectionWhy this is reasonable• The last few words may tell us a lot about the nextword:Collocations (“kick the bucket”)prediction of current category:the is followed by nouns or adjectivessemantic domain6.863J/9.611J SP11What is this good for?• Spam vs. Ham• Text categorization: like language ID• Topic 1 sample: In the beginning God created …• Topic 2 sample: The history of all hitherto existing society isthe history of class struggles. …• Input text: Matt’s Communist Homepage. Capitalism is unfairand has been ruining the lives of millions of people around theworld. The profits from the workers’ labor …• Input text: And they have beat their swords to ploughshares, Andtheir spears to pruning-hooks. Nation doth not lift up sword untonation, neither do they learn war any more…• Speech recognition. Only trigrams seem to work (*)• Part of speech tagging (the big book – Article Adjective ___)6.863J/9.611J SP11What is this good for?• Text categorization: Break big document or mediastream into indexable chunks• From NPR’s All Things Considered:• The U. N. says its observers will stay in Liberia only as long as WestAfrican peacekeepers do, but West African states are threatening topull out of the force unless Liberia’s militia leaders stop violating lastyear’s peace accord after 7 weeks of chaos in the capital, Monrovia …Human rights groups cite peace troops as among those smuggling thearms. I’m Jennifer Ludden, reporting. Whitewater prosecutionwitness David Hale began serving a 28-month prison sentence today.The Arkansas judge and banker pleaded guilty two years ago todefrauding the Small Business Administration. Hale was the mainwitness in the Whitewater-related trial that led to the convictions …6.863J/9.611J SP11Speech RecognitionListen carefully: what am I saying?• How do you recognize speech?• How do you wreck a nice beach?• Put the file in the folder• Put the file and the folder6.863J/9.611J SP11Language generation• Choose randomly among outputs:– Visitant which came into the place where it will be Japanese hasadmired that there was Mount Fuji.• Top 10 outputs according to bigram probabilities:– Visitors who came in Japan admire Mount Fuji.– Visitors who came in Japan admires Mount Fuji.– Visitors who arrived in Japan admire Mount Fuji.– Visitors who arrived in Japan admires Mount Fuji.– Visitors who came to Japan admire Mount Fuji.– A visitor who came in Japan admire Mount Fuji.– The visitor who came in Japan admire Mount Fuji.– Visitors who came in Japan admire Mount Fuji.– The visitor who came in Japan admires Mount Fuji.– Mount Fuji is admired by a visitor who came in Japan.6.863J/9.611J SP11What can trigram language models dofor us?SUBJECT: TonightHi everybody , A few of us are meeting up at the &NAME , &NAME , around &NUM : &NUM fordinner and drinks . All welcome - no excuses now - deadlines a month awaySUBJECT: Re : HIHi , I just got a 15 year fixed mortgage. I found this website where Lenders compete for yourbusiness . I thought you may want to look at it . Thanks !Can distinguish ‘spam’ from ‘ham’What can trigram language models dofor us?Ivan je čitao knjigu dok je Marija gledala neki filmAnswer: CroatianCompare probabilities of generating sentence s, given that itwas from language L, pick the ‘most likely’ one (Is thisright? We’ll put that to one side for now…)That is, for a given find maximum: p(sent=s | language=english)p(s | language=polish)…Is this what we really want to do?Bayes’ Theorem & language modelsRecall: p(A|B) = p(A,B)/p(B), so p(A|B) p(B) = p(A,B) = p(B|A) p(A)Therefore, dividing both red sides by p(B):• p(A | B) =


View Full Document

MIT 6 863J - Lecture 3: Language models

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Lecture 3: Language models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 3: Language models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 3: Language models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?