MIT 6 825 - Inference in Bayesian Networks - D892872

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 825> Inference in Bayesian Networks

MIT 6 825 - Inference in Bayesian Networks

School name Massachusetts Institute of Technology

Course 6 825- Techniques in Artificial Intelligence

Pages 74

Download Save

Unformatted text preview:

1Lecture 16 • 16.825 Techniques in Artificial IntelligenceInference in Bayesian NetworksNow that we know what the semantics of Bayes nets are; what it means when we have one, we need to understand how to use it. Typically, we’ll be in a situation in which we have some evidence, that is, some of the variables are instantiated, and we want to infer something about the probability distribution of some other variables.2Lecture 16 • 26.825 Techniques in Artificial IntelligenceInference in Bayesian Networks• Exact inferenceIn exact inference, we analytically compute the conditional probability distribution over the variables of interest.3Lecture 16 • 36.825 Techniques in Artificial IntelligenceInference in Bayesian Networks• Exact inference• Approximate inferenceBut sometimes, that’s too hard to do, in which case we can use approximation techniques based on statistical sampling.4Lecture 16 • 4Query TypesGiven a Bayesian network, what questions might we want to ask?Given a Bayesian network, what kinds of questions might we want to ask?5Lecture 16 • 5Query TypesGiven a Bayesian network, what questions might we want to ask?• Conditional probability query: P(x | e)The most usual is a conditional probability query. Given instantiations for some of the variables (we’ll use e here to stand for the values of all the instantiated variables; it doesn’t have to be just one), what is the probability that node X has a particular value x?6Lecture 16 • 6Query TypesGiven a Bayesian network, what questions might we want to ask?• Conditional probability query: P(x | e)• Maximum a posteriori probability:What value of x maximizes P(x|e) ?Another interesting question you might ask is, what is the most likely explanation for some evidence? We can think of that as the value of node X (or of some group of nodes) that maximizes the probability that you would have seen the evidence you did. This is called the maximum a posteriori probability or MAP query.7Lecture 16 • 7Query TypesGiven a Bayesian network, what questions might we want to ask?• Conditional probability query: P(x | e)• Maximum a posteriori probability:What value of x maximizes P(x|e) ?General question: What’s the whole probability distribution over variable X given evidence e, P(X | e)?In our discrete probability situation, the only way to answer a MAP query is to compute the probability of x given e for all possible values of x and see which one is greatest.8Lecture 16 • 8Query TypesGiven a Bayesian network, what questions might we want to ask?• Conditional probability query: P(x | e)• Maximum a posteriori probability:What value of x maximizes P(x|e) ?General question: What’s the whole probability distribution over variable X given evidence e, P(X | e)?So, in general, we’d like to be able to compute a whole probability distribution over some variable or variables X, given instantiations of a set of variables e.9Lecture 16 • 9Using the joint distributionTo answer any query involving a conjunction of variables, sum over the variables not involved in the query.Given the joint distribution over the variables, we can easily answer any question about the value of a single variable by summing (or marginalizing) over the other variables.10Lecture 16 • 10Using the joint distributionTo answer any query involving a conjunction of variables, sum over the variables not involved in the query.Pr(d) = Pr(a,b,c,d)ABC∑= Pr(A = a ∧ B = b ∧C = c)c∈dom(C)∑b∈dom(B )∑a∈dom(A)∑So, in a domain with four variables, A, B, C, and D, the probability that variable D has value d is the sum over all possible combinations of values of the other three variables of the joint probability of all four values. This is exactly the same as the procedure we went through in the last lecture, where to compute the probability of cavity, we added up the probability of cavity and toothache and the probability of cavity and not toothache.11Lecture 16 • 11Using the joint distributionTo answer any query involving a conjunction of variables, sum over the variables not involved in the query.Pr(d) = Pr(a,b,c,d)ABC∑= Pr(A = a ∧ B = b ∧C = c)c∈dom(C)∑b∈dom(B )∑a∈dom(A)∑In general, we’ll use the first notation, with a single summation indexed by a list of variable names, and a joint probability expression that mentions values of those variables. But here we can see the completely written-out definition, just so we all know what the shorthand is supposed to mean.12Lecture 16 • 12Using the joint distributionTo answer any query involving a conjunction of variables, sum over the variables not involved in the query.Pr(d) = Pr(a,b,c,d)ABC∑= Pr(A = a ∧ B = b ∧C = c)c∈dom(C)∑b∈dom(B )∑a∈dom(A)∑Pr(d | b) =Pr(b,d)Pr(b)=Pr(a,b,c,d)AC∑Pr(a,b,c,d)ACD∑To compute a conditional probability, we reduce it to a ratio of conjunctive queries using the definition of conditional probability, and then answer each of those queries by marginalizing out the variables not mentioned.13Lecture 16 • 13Using the joint distributionTo answer any query involving a conjunction of variables, sum over the variables not involved in the query.Pr(d) = Pr(a,b,c,d)ABC∑= Pr(A = a ∧ B = b ∧C = c)c∈dom(C)∑b∈dom(B )∑a∈dom(A)∑Pr(d | b) =Pr(b,d)Pr(b)=Pr(a,b,c,d)AC∑Pr(a,b,c,d)ACD∑In the numerator, here, you can see that we’re only summing over variables A and C, because b and d are instantiated in the query.14Lecture 16 • 14Simple CaseWe’re going to learn a general purpose algorithm for answering these joint queries fairly efficiently. We’ll start by looking at a very simple case to build up our intuitions, then we’ll write down the algorithm, then we’ll apply it to a more complex case.15Lecture 16 • 15Simple CaseA B C DOkay. Here’s our very simple case. It’s a bayes net with four nodes, arranged in a chain.16Lecture 16 • 16Simple CaseA B C D∑=ABCdcbad ),,,Pr()Pr(So, we know from before that the probability that variable D has some value little d is the sum over A, B, and C of the joint distribution, with d fixed.17Lecture 16 • 17Simple CaseA B C D∑∑==ABCABCaabbccddcbad)Pr()|Pr()|Pr()|Pr(),,,Pr()Pr(Now, using the chain rule of Bayesian networks, we can write down the joint probability as a product over the nodes of the probability of each node’s value given the values of its parents. So, in this case, we get P(d|c) times P(c|b) times P(b|a) times P(a).18Lecture

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 825 - Inference in Bayesian Networks

Sign up for free to view:

Please select your school