1CS 188: Artificial IntelligenceFall 2008Lecture 16: Bayes Nets IIILecture 16: Bayes Nets III10/23/2008Dan Klein – UC Berkeley12Announcements Midterms graded, up on glookup, back TuesdayW4 also graded, back in sections / boxW4 also graded, back in sections / box Past homeworks in return box in 2ndfloor lab23Causality? When Bayes’ nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from expertsBNs need not actually be causalBNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and Drips End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology only guaranteed to encode conditional independencies34Example: Traffic Basic traffic net Let’s multiply out the jointRTr 1/4¬r3/4r t 3/4¬t1/4¬rt 1/2¬t1/2r t 3/16r¬t1/16¬rt 6/16¬r ¬t6/1645Example: Reverse Traffic Reverse causality?TRt 9/16¬t7/16t r 1/3¬r2/3¬tr 1/7¬r6/7r t 3/16r¬t1/16¬rt 6/16¬r ¬t6/1656Topology Limits DistributionsX YAll distributions6X Y7Non-Guaranteed Independence Adding an arc doesn’t guarantee dependence, it just makes it possibleX1X2X1X2h 0.5t 0.5h 0.5t 0.5X1X2h 0.5t 0.5h | h 0.5t | h 0.5X1X2h | t 0.5t | t 0.578Alternate BNs89Summary Bayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from BN graph structure A Bayes’ net may have other independencies that are not detectable until you inspect its specific distribution The Bayes’ ball algorithm (aka d-separation) tells us when an observation of one variable can change belief about another variable910Inference Inference: calculating some statistic from a joint probability distribution Examples:Posterior probability:R BLPosterior probability: Most likely explanation:TDT’1011Reminder: Alarm Network1112Inference by Enumeration Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you needFigure out ALL the atomic probabilities you needFigure out ALL the atomic probabilities you need Calculate and combine them Example:1213ExampleWhere did we use the BN structure?We didn’t!1314Example In this simple method, we only need the BN to synthesize the joint entries1415Normalization TrickNormalize1516Inference by Enumeration?1617Variable Elimination Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called “Variable Elimination” Still NP-hard, but usually much faster than inference by enumeration We’ll need some new notation to define VE1718Factor Zoo I Joint distribution: P(X,Y) Entries P(x,y) for all x, y Sums to 1T W Phot sun 0.4hot rain 0.1coldsun0.2 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x)18coldsun0.2cold rain 0.3T W Pcold sun 0.2cold rain 0.319Factor Zoo II Family of conditionals: P(X |Y) Multiple conditionals Entries P(x | y) for all x, y Sums to |Y|T W Phot sun 0.8hot rain 0.2coldsun0.4 Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y Sums to 119coldsun0.4cold rain 0.6T W Pcold sun 0.4cold rain 0.620Factor Zoo III Specified family: P(y | X) Entries P(y | x) for fixed y,all x Sums to … who knows!T W Phot rain 0.2cold rain 0.6 In general, when we write P(Y1… YN| X1… XM) It is a “factor,” a multi-dimensional array Its values are all P(y1… yN| x1… xM) Any unassigned X or Y is a dimension missing (selected) from the array2021Basic Objects Track objects called factors Initial factors are local CPTs One per node in the BN Any known values are specified E.g. if we know J = j and E = ¬e, the initial factors are VE: Alternately join and marginalize factors2122Basic Operation: Join First basic operation: join factors Combining two factors: Just like a database joinBuild a factor over the union of the variables involvedBuild a factor over the union of the variables involved Example: Computation for each entry: pointwise products2223Basic Operation: Join In general, we join on a variable Take all factors mentioning that variable Join them all togetherExample:Example: Join on A: Pick up these: Join to form: 2324Basic Operation: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller oneA projectionoperationA projectionoperation Example: Definition:2425General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Project out H Join all remaining factors and normalize2526ExampleChoose AChoose A2627ExampleChoose EFinish with BNormalize2728Variable Elimination What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow Better than enumeration: VE caches intermediate computations Saves time by marginalizing variables as soon as possible rather than at the endthan at the end Polynomial time for tree-structured graphs – sound familiar? We will see special cases of VE later You’ll have to implement the special cases Approximations Exact inference is slow, especially with a lot of hidden nodes Approximate methods give you a (close, wrong?) answer, faster2829Sampling Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples2930Prior SamplingCloudySprinklerRainCloudySprinklerRainSprinklerRainWetGrassSprinklerRainWetGrass3031Prior Sampling This process generates samples with probability…i.e. the BN’s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent3132Example We’ll get a
View Full Document