1CS 188: Artificial IntelligenceFall 2006Lecture 14: Probability10/17/2006Dan Klein – UC BerkeleyAnnouncements Grades: Check midterm, p1.1, and p1.2 grades in glookup Let us know if there are problems, so we can calculate useful preliminary grade estimates If we missed you, tell us your partner’s login Readers request: List partners and logins on top of your readme files Turn off your debug output Project 3.1 up: written probability problems, start now! Extra office hours: Thursday 2-3pm (if people use them)2Today Probability Random Variables Joint and Conditional Distributions Bayes Rule Independence You’ll need all this stuff for the next few weeks, so make sure you go over it!Uncertainty General situation: Agent knows certain things about the state of the world (e.g., sensor readings or symptoms) Agent needs to reason about other aspects (e.g. where an object is or what disease is present) Agent knows something about how the known variables relate to the unknown variables Probabilistic reasoning gives us a framework for managing our beliefs and knowledge3Random Variables A random variable is some aspect of the world about which we have uncertainty R = Is it raining? D = How long will it take to drive to work? L = Where am I? We denote random variables with capital letters Like in a CSP, each random variable has a domain R in {true, false} D in [0, ∞] L in possible locationsProbabilities We generally calculate conditional probabilities P(on time | no reported accidents) = 0.90 Probabilities change with new evidence: P(on time | no reported accidents, 5 a.m.) = 0.95 P(on time | no reported accidents, 5 a.m., raining) = 0.80 i.e., observing evidence causes beliefs to be updated4Probabilistic Models CSPs: Variables with domains Constraints: map from assignments to true/false Ideally: only certain variables directly interact Probabilistic models: (Random) variables with domains Joint distributions: map from assignments (or outcomes) to positive numbers Normalized: sum to 1.0 Ideally: only certain variables directly interact Assignments are called outcomes0.3raincold0.2suncold0.1rainwarm0.4sunwarmPBATraincoldFsuncoldFrainwarmTsunwarmPBADistributions on Random Vars A joint distribution over a set of random variables:is a map from assignments (or outcomes, or atomic events) to reals: Size of distribution if n variables with domain sizes d? Must obey: For all but the smallest distributions, impractical to write out0.3raincold0.2suncold0.1rainwarm0.4sunwarmPST5Examples An event is a set E of assignments (or outcomes) From a joint distribution, we can calculate the probability of any event Probability that it’s warm AND sunny? Probability that it’s warm? Probability that it’s warm OR sunny?0.3raincold0.2suncold0.1rainwarm0.4sunwarmPSTMarginalization Marginalization (or summing out) is projecting a joint distribution to a sub-distribution over subset of variables0.3raincold0.2suncold0.1rainwarm0.4sunwarmPST0.5cold0.5warmPT0.4rain0.6sunPS6Conditional Probabilities A conditional probability is the probability of an event given another event (usually evidence)0.3raincold0.2suncold0.1rainwarm0.4sunwarmPSTConditional Probabilities Conditional or posterior probabilities: E.g., P(cavity | toothache) = 0.8 Given that toothache is all I know… Notation for conditional distributions: P(cavity | toothache) = a single number P(Cavity, Toothache) = 2x2 table summing to 1 P(Cavity | Toothache) = Two 2-element vectors, each summing to 1 If we know more: P(cavity | toothache, catch) = 0.9 P(cavity | toothache, cavity) = 1 Note: the less specific belief remains valid after more evidence arrives, but is not always useful New evidence may be irrelevant, allowing simplification: P(cavity | toothache, traffic) = P(cavity | toothache) = 0.8 This kind of inference, guided by domain knowledge, is crucial7Conditioning Conditional probabilities are the ratio of two probabilities:0.3raincold0.2suncold0.1rainwarm0.4sunwarmPSTNormalization Trick A trick to get the whole conditional distribution at once: Get the joint probabilities for each value of the query variable Renormalize the resulting vector0.3raincold0.2suncold0.1rainwarm0.4sunwarmPST0.3cold0.1warmPT0.75cold0.25warmPTSelectNormalize8The Product Rule Sometimes joint P(X,Y) is easy to get Sometimes easier to get conditional P(X|Y) Example: P(sun, dry)?0.2rain0.8sunPR0.3raindry0.7rainwet0.9sundry0.1sunwetPSD0.06raindry0.14rainwet0.72sundry0.08sunwetPSDLewis Carroll's Sack Problem Sack contains a red or blue token, 50/50 We add a red token If we draw a red token, what’s thechance of drawing a second red token? Variables: F={r,b} is the original token D={r,b} is the first token we draw Query: P(F=r|D=r)0.5b0.5rPFbbrbbrrrPDF0.5bb0.5rb0.0br1.0rrPDF9Lewis Carroll's Sack Problem Now we have P(F,D) Want P(F=r|D=r)0.25bb0.25rb0.0br0.5rrPDFBayes’ Rule Two ways to factor a joint distribution over two variables: Dividing, we get: Why is this at all helpful? Lets us invert a conditional distribution Often the one conditional is tricky but the other simple Foundation of many systems we’ll see later (e.g. ASR, MT) In the running for most important AI equation!That’s my rule!10More Bayes’ Rule Diagnostic probability from causal probability: Example: m is meningitis, s is stiff neck Note: posterior probability of meningitis still very small Note: you should still get stiff necks checked out! Why?Battleship Let’s say we have two distributions: Prior distribution over ship locations: P(L) Say this is uniform Sensor reading model: P(R | L) Given by some known black box E.g. P(R = yellow | L=(1,1)) = 0.1 For now, assume the reading is always for the lower left corner We can calculate the posterior distribution over ship locations using (conditionalized) Bayes’ rule:11Inference by Enumeration P(sun)? P(sun | winter)? P(sun | winter, warm)?0.30sunwarmsummer0.05rainwarmsummer0.10suncoldsummer0.05raincoldsummerwinterwinterwinterwinterS0.20raincold0.15suncold0.05rainwarm0.10sunwarmPRTInference by Enumeration General case: Evidence variables: Query variables: Hidden variables: We want: First, select the entries
View Full Document