DOC PREVIEW
UMD LBSC 796 - Language Models

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1LBSC 796/INFM 718R: Week 4Language ModelsJimmy LinCollege of Information StudiesUniversity of MarylandMonday, February 20, 2006Last Time…| Boolean modelz Based on the notion of setsz Documents are retrieved only if they satisfy Boolean conditions specified in the queryz Does not impose a ranking on retrieved documentsz Exact match| Vector space modelz Based on geometry, the notion of vectors in high dimensional spacez Documents are ranked based on their similarity to the query (ranked retrieval)z Best/partial matchToday| Language modelsz Based on the notion of probabilities and processes for generating textz Documents are ranked based on the probability that they generated the queryz Best/partial match| First we start with probabilities…Probability| What is probability?z Statistical: relative frequency as n →∞z Subjective: degree of belief| Thinking probabilisticallyz Imagine a finite amount of “stuff” (= probability mass)z The total amount of “stuff” is onez The event space is “all the things that could happen”z Distribute that “mass” over the possible eventsz Sum of all probabilities have to add up to oneKey Concepts| Defining probability with frequency| Statistical independence| Conditional probability| Bayes’ TheoremStatistical Independence| A and B are independent if and only if:z P(A and B) = P(A) × P(B)| Simplest example: series of coin flips| Independence formalizes “unrelated”z P(“being brown eyed”) = 6/10z P(“being a doctor”) = 1/1000z P(“being a brown eyed doctor”) = P(“being brown eyed”) × P(“being a doctor”) = 6/10,0002Dependent Events| Suppose:z P(“having a B.S. degree”) = 4/10z P(“being a doctor”) = 1/1000| Would you expect:z P(“having a B.S. degree and being a doctor”) = P(“having a B.S. degree”) × P(“being a doctor”)= 4/10,000| Another example:z P(“being a doctor”) = 1/1000z P(“having studied anatomy”) = 12/1000z P(“having studied anatomy” | “being a doctor”) = ??Conditional ProbabilityABA and BP(A | B) ≡ P(A and B) / P(B)Event SpaceP(A) = prob. of A relative to entire event spaceP(A|B) = prob. of A considering that we know B is trueDoctors and AnatomyP(A | B) ≡ P(A and B) / P(B)A = having studied anatomyB = being a doctorWhat is P(“having studied anatomy” | “being a doctor”)?P(“being a doctor”) = 1/1000P(“having studied anatomy”) = 12/1000P(“being a doctor who studied anatomy”) = 1/1000P(“having studied anatomy” | “being a doctor”) = 1More on Conditional Probability| What if P(A|B) = P(A)?| Is P(A|B) = P(B|A)?A and B must be statistically independent!A = having studied anatomyB = being a doctorP(“having studied anatomy” | “being a doctor”) = 1P(“being a doctor”) = 1/1000P(“having studied anatomy”) = 12/1000P(“being a doctor who studied anatomy”) = 1/1000P(“being a doctor” | “having studied anatomy”) = 1/12If you’re a doctor, you must have studied anatomy…If you’ve studied anatomy, you’re more likely to be a doctor, but you could also be a biologist, for exampleProbabilistic Inference| Suppose there’s a horrible, but very rare disease| But there’s a very accurate test for it| Unfortunately, you tested positive…The probability that you contracted it is 0.01%The test is 99% accurateShould you panic?Bayes’ Theorem| You want to find| But you only knowz How rare the disease isz How accurate the test is| Use Bayes’ Theorem (hence Bayesian Inference)P(“have disease” | “test positive”))()()|()|(BPAPABPBAP =Prior probabilityPosterior probability3Applying Bayes’ Theorem| P(“have disease”) = 0.0001 (0.01%)| P(“test positive” | “have disease”) = 0.99 (99%)| P(“test positive”) = 0.010098Two case:1. You have the disease, and you tested positive2. You don’t have the disease, but you tested positive (error)Case 1: (0.0001)(0.99) = 0.000099Case 2: (0.9999)(0.01) = 0.009999Case 1+2 = 0.010098P(“have disease” | “test positive”)= (0.99)(0.0001) / 0.010098= 0.009804 = 0.9804%Don’t worry!Another ViewIn a population of one million people100 are infected 999,900 are not99 test positive1 test negative9999test positive989901test negative10098 will test positive…Of those, only 99 really have the disease!Competing Hypotheses| Consider z A set of hypotheses: H1, H2, H3z Some observable evidence: O| If you observed O, what likely caused it?| Example:z You know that three things can cause the grass to be wet: rain, sprinkler, floodz You observed that that grass is wetz What caused it?P1= P(H1|O)P2= P(H2|O)P3= P(H3|O)Which explanation is most likely?An Example| Letz O = “Joe earns more than $80,000/year”z H1 = “Joe is a NBA referee”z H2 = “Joe is a college professor”z H3 = “Joe works in food services”| Suppose we know that Joe earns more than $80,000 a year…| What should be our guess about Joe’s profession?What’s his job?| Suppose we do a survey and we find outz P(O|H1) = 0.6 P(H1) = 0.0001 refereez P(O|H2) = 0.07 P(H2) = 0.001 professorz P(O|H3) = 0.001 P(H3) = 0.02 food services| We can calculatez P(H1|O) = 0.00006 / P(“earning > $80K/year”)z P(H2|O) = 0.00007 / P(“earning > $80K/year”)z P(H3|O) = 0.00002 / P(“earning > $80K/year”)| What do we guess?Recap: Key Concepts| Defining probability with frequency| Statistical independence| Conditional probability| Bayes’ Theorem4What is a Language Model?| Probability distribution over strings of textz How likely is a string in a given “language”?| Probabilities depend on what language we’re modelingp1= P(“a quick brown dog”)p2= P(“dog quick a brown”)p3= P(“быстрая brown dog”)p4= P(“быстрая собака”)In a language model for English: p1> p2> p3> p4In a language model for Russian: p1< p2< p3< p4How do we model a language?| Brute force counts?z Think of all the things that have ever been said or will ever be said, of any lengthz Count how often each one occurs| Is understanding the path to enlightenment?z Figure out how meaning and thoughts are expressedz Build a model based on this| Throw up our hands and admit defeat?Unigram Language Model| Assume each word is generated independentlyz Obviously, this is not true…z But it seems to work well in practice!| The probability of a string, given a model:∏==kiikMqPMqqP11)|()|( KThe probability of a sequence of words decomposes into a product of the probabilities of individual wordsA Physical Metaphor| Colored


View Full Document

UMD LBSC 796 - Language Models

Download Language Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Language Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Language Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?