DOC PREVIEW
CMU CS 10601 - lecture

This preview shows page 1-2-3-25-26-27-28-50-51-52 out of 52 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Graphical models and Bayesian networks10-601 Machine LearningIndependence• In our density estimation class (and in the Naïve Bayes classifier class) we discussed at length the usefulness of the independence assumption• However, we also mentioned its drawbacksIndependence• Independence allows for easier models, learning and inference• For example, with 3 binary variables we only need 3 parameters rather than 7. • The saving is even greater if we have many more variables … • In many cases it would be useful to assume independence, even if its not the case• Is there any middle ground?Bayesian networks• Bayesian networks are directed graphs with nodes representingrandom variables and edges representing dependency assumptions• Lets use our movie example: We would like to determine the joint probability for length, liked and slept in a movieLoLi SLong?Slept?Liked?Bayesian networks: NotationsLeLi SP(Lo) = 0.5P(Li | Lo) = 0.4P(Li | Lo) = 0.7P(S | Lo) = 0.6P(S | Lo) = 0.2Conditional probability tables (CPTs)Conditional dependencyRandom variablesBayesian networks are directed acyclic graphs.Bayesian networks: NotationsLeLi SP(Lo) = 0.5P(Li | Lo) = 0.4P(Li | Lo) = 0.7P(S | Lo) = 0.6P(S | Lo) = 0.2The Bayesian network below represents the following joint probability distribution:p(Le,Li,S)  P(Le)P(Li | Le)P (S | Le)More generally Bayesian network represent the following joint probability distribution:p(x1xn)  p(xi| Pa(xi))iThe set of parents of xiin the graphConstructing a Bayesian network• How do we go about constructing a network for a specific problem?• Step 1: Identify the random variables• Step 2: Determine the conditional dependencies• Step 3: Populate the CPTsCan be learned from observation data!A example problem• An alarm systemB – Did a burglary occur?E – Did an earthquake occur?A – Did the alarm sound off?M – Mary callsJ – John calls• How do we reconstruct the network for this problem?Factoring joint distributions• Using the chain rule we can always factor a joint distribution as follows:P(A,B,E,J,M) = P(A | B,E,J,M) P(B,E,J,M) =P(A | B,E,J,M) P(B | E,J,M) P(E,J,M) = P(A | B,E,J,M) P(B | E, J,M) P(E | J,M) P(J,M)P(A | B,E,J,M) P(B | E, J,M) P(E | J,M)P(J | M)P(M)• This type of conditional dependencies can also be represented graphically.A Bayesian networkEJ MA BNumber of parameters:A: 2^4B: 2^3E: 4 J: 2M: 1A total of 31 parametersP(A | B,E,J,M) P(B | E, J,M) P(E | J,M)P(J | M)P(M)A better approach• An alarm systemB – Did a burglary occur?E – Did an earthquake occur?A – Did the alarm sound off?M – Mary callsJ – John calls• Lets use our knowledge of the domain!Reconstructing a networkAJ MB EB – Did a burglary occur?E – Did an earthquake occur?A – Did the alarm sound off?M – Mary callsJ – John callsReconstructing a networkAJ MB ENumber of parameters:A: 4B: 1E: 1J: 2M: 2A total of 10 parametersBy relying on domain knowledge we saved 21 parameters!Constructing a Bayesian network: Revisited• Step 1: Identify the random variables• Step 2: Determine the conditional dependencies- Select on ordering of the variables- Add them one at a time- For each new variable X added select the minimal subset of nodes as parents such that X is independent from all other nodes in the current network given its parents.• Step 3: Populate the CPTs- From examples using density estimationReconstructing a networkAJ MB ESuppose we wanted to add a new variable to the network:R – Did the radio announce that there was an earthquake? How should we insert it?RExample: Bayesian networks for cancer detectionExample: Gene expression networkQuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Conditional independenceAJ MB E• Two variables x,y are said to be conditionally independent given a third variable z if p(x,y|z) = p(x|z)p(y|z) • In a Bayesian network a variable is conditionally independent of all other variables given it Markov blanketMarkov blanket: All parent, children's and co-parents of childrenMarkov blankets: ExamplesAJ MB EMarkov blanket for B: E, AMarkov blanket for A: B, E, J, MBayesian network: Inference• Once the network is constructed, we can use algorithms for inferring the values of unobserved variables.• For example, in our previous network the only observed variables are the phone call and the radio announcement. However, what we are really interested in is whether there was a burglary or not.• How can we determine that?Inference• Lets start with a simpler question- How can we compute a joint distribution from the network?- For example, P(B,E,A,J, M)?• Answer:- That’s easy, lets use the networkComputing: P(B,E,A,J, M)AJ MB EP(B)=.05P(E)=.1P(A|B,E) =.95P(A|B,E) = .85P(A|  B,E) =.5P(A|  B,  E) = .05P(J|A) )=.7P(J|A) = .05P(M|A) =.8P(M|A) = .15P(B,E,A,J, M) = P(B)P(E)P(A | B, E) P(J | A)P(M | A)= 0.05*0.9*.85*.7*.2= 0.005355Computing: P(B,E,A,J, M)AJ MB EP(B)=.05P(E)=.1P(A|B,E) )=.95P(A|B,E) = .85P(A|  B,E) )=.5P(A|  B,  E) = .05P(J|A) )=.7P(J|A) = .05P(M|A) )=.8P(M|A) = .15P(B,E,A,J, M) = P(B)P(E)P(A | B, E) P(J | A)P(M | A)= 0.05*0.9*.85*.7*.2= 0.005355We can easily compute a complete joint distribution. What about partial distributions? Conditional distributions?Inference• We are interested in queries of the form:P(B | J,M)• This can also be written as a joint:• How do we compute the new joint?),,(),,(),,(),|(MJBPMJBPMJBPMJBPAJ MB EInference in Bayesian networks• We will discuss three methods:1. Enumeration 2. Variable elimination3. Stochastic inferenceComputing partial joints),,(),,(),,(),|(MJBPMJBPMJBPMJBPSum all instances with these settings (the sum is over the possible assignments to the other two variables, E and A)Computing: P(B,J, M)AJ MB EP(B)=.05P(E)=.1P(A|B,E) )=.95P(A|B,E) = .85P(A|  B,E) )=.5P(A|  B,  E) = .05P(J|A) )=.7P(J|A) = .05P(M|A) )=.8P(M|A) = .15P(B,J, M) = P(B,J, M,A,E)+ P(B,J, M,  A,E) + P(B,J, M,A,  E) + P(B,J, M,  A,  E) =0.0007+0.00001+0.005+0.0003 = 0.00601Computing partial joints),,(),,(),,(),|(MJBPMJBPMJBPMJBPSum all instances with these settings (the sum is over the possible assignments to the other two variables, E and A)• This method can be improved by re-using calculations (similar to dynamic programming)•


View Full Document

CMU CS 10601 - lecture

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?