Unformatted text preview:

Bayes Networks 6.872/HST.950What Probabilistic Models Should We Use? • Full joint distribution • Completely expressive • Hugely data-hungry • Exponential computational complexity • Naive Bayes (full conditional independence) • Relatively concise • Need data ~ (#hypotheses) × (#features) × (#feature-vals) • Fast ~ (#features) • Cannot express dependencies among features or among hypotheses • Cannot consider possibility of multiple hypotheses co -occurringBayesian Networks (aka Belief Networks) • Graphical representation of dependencies among a set of random variables • Nodes: variables • Directed links to a node from its parents: direct probabilistic dependencies • Each Xi has a conditional probability distribution, P(Xi|Parents(Xi)), showing the effects of the parents on the node. • The graph is directed (DAG); hence, no cycles. • This is a language that can express dependencies between Naive Bayes and the full joint distribution, more concisely • Given some new evidence, how does this affect the probability of some other node(s)? P(X|E) —belief propagation/updating • Given some evidence, what are the most likely values of other variables? —MAP explanationBurglary Network (due to J. Pearl) Alarm Burglary Earthquake JohnCalls MaryCallsBurglary Network (due to J. Pearl) Alarm Burglary Earthquake JohnCalls MaryCalls P(B) 0.001 P(E) 0.002 B E P(A|B,E) t t 0.95 t f 0.94 f t 0.29 f f 0.001 A P(M|A) t 0.70 f 0.01 A P(J|A) t 0.90 f 0.05If everything depends on everything Alarm Burglary Earthquake JohnCalls MaryCalls 2423 22 2021 • This model requires just as many parameters as the full joint distribution!Computing the Joint Distribution from a Bayes Network • As usual, we abuse notation: • • E.g., what’s the probability that an alarm has sounded, there was neither an earthquake nor a burglary, and both John and Mary called?Requirements for Constructing a BN • Recall that the definition of the conditional probability was • and thus we get the chain rule, • Generalizing to n variables, • and repeatedly applying this idea, • This “works” just in case we can define a partial order so thatTopological Interpretations X Y1 Yn UnU1 Z1 Zn A node, X, is conditionally independent of its non-descendants, Zi, given its parents, Ui. X Y1 Yn UnU1 Z1 Zn A node, X, is conditionally independent of all other nodes in the network given its Markov blanket: its parents, Ui, children,Yi, and children’s parents, Zi.BN’s can be Compact • For a network of 40 binary variables, the full joint distribution has 240 entries (> 1,000,000,000,000) • If |Par(xi)| ≤ 5, however, then the 40 (conditional) probability tables each have ≤ 32 entries, so the total number of parameters ≤ 1,280 • Largest medical BN I know (Pathfinder) had 109 variables! 2109 ≈ 1036How Not to Build BN’s Alarm Burglary Earthquake JohnCalls MaryCalls • With the wrong ordering of nodes, the network becomes more complicated, and requires more (and more difficult) conditional probability assessments Alarm Burglary Earthquake JohnCalls MaryCalls AlarmBurglary Earthquake JohnCalls MaryCalls Order: M, J,A, B, E Order: M, J, E, B,ASimplifying Conditional Probability Tables • Do we know any structure in the way that Par(x) “cause” x? • If each destroyer can sink the ship with probability P(s|di), what is the probability that the ship will sink if it’s attacked by both? • For |Par(x)| = n, this requires O(n) parameters, not O(kn) Photo by Konabish on Flickr. Image by MIT OpenCourseWare.Image by MIT OpenCourseWare.Inference • Recall the two basic inference problems: Belief propagation & MAP explanation • Trivially, we can enumerate all “matching” rows of the joint probability distribution • For poly-trees (not even undirected loops—i.e., only one connection between any pair of nodes; like our Burglary example), there are efficient linear algorithms, similar to constraint propagation • For arbitrary BN’s, all inference is NP-hard! • Exact solutions • ApproximationAlarm Burglary Earthquake JohnCalls MaryCalls Exact Solution of BN’s (Burglary example) • Notes: • Sum over all “don’t care” variables • Factor common terms out of summation • Calculation becomes a sum of products of sums of products ...Poly-trees are easy • Singly-connected structures allow propagation of observations via single paths • “Down” is just use of conditional probability • “Up” is just Bayes rule • Formulated as message propagation rules • Linear time (network diameter) • Fails on general networks! Alarm Burglary Earthquake JohnCalls MaryCalls TVshow Drunk Bottles EscapeExact Solution of BN’s (non-poly-trees) • What is the probability of a specific state, say A=t, B=f, C=t, D=t, E=f? • What is the probability that E=t given B=t? A B D C E • Consider the term P(e,b) Alas, optimal factoring is NP-hard • 12 instead of 32 multiplications (even in this small example)Other Exact Methods A B D C E A D B,C E • Join-tree: Merge variables into (small!) sets of variables to make graph into a poly-tree. Most commonly-used; aka Clustering, Junction-tree, Potential) • Cutset-conditioning: Instantiate a (small!) set of variables, then solve each residual problem, and add solutions weighted by probabilities of the instantiated variables having those values ...• • All these methods are essentially equivalent; with some time-space tradeoffs.Approximate Inference in BN’s • Direct Sampling—samples joint distribution • Rejection Sampling—computes P(X|e), uses ancestor evidence nodes in sampling • Likelihood Weighting—like Rejection Sampling, but weights by probability of descendant evidence nodes • Markov chain Monte Carlo • Gibbs and other similar sampling methodsDirect Sampling function Prior-Sample(bn) returns an event sampled from bn inputs: bn, a Bayes net specifying the joint distribution P(X1, ... Xn) x := an event with n elements for i = 1 to n do xi := a random sample from P(Xi|Par(Xi)) return x • From a large number of samples, we can estimate all joint probabilities • The probability of an event is the fraction of all complete events generated by PS that match the partially specified event • hence we can compute all conditionals, etc.Rejection Sampling function Rejection-Sample(X, e, bn, N) returns an estimate of P(X|e) inputs: bn, a Bayes net X, the query variable e,


View Full Document
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?