Unformatted text preview:

1CS 559: Machine Learning Fundamentals and Applications11thSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiE-mail: [email protected]: Lieb 2151Overview• Graphical Models– Tutorial by Christopher Bishop• http://research.microsoft.com/en-us/um/people/cmbishop/PRML/– Same tutorial modified by Marshall Tappen• http://www.cs.ucf.edu/~mtappen/cap6412/lecs/graphical_models.pdf– Also see Mark Paskin’s tutorial• http://ai.stanford.edu/~paskin/gm-short-course/index.html• Not in DHS2Graphical Models• Consider the distributionp(a,b,c)• Can be rewritten asp(a,b,c)=p(c|a,b)p(a,b)• Which can be rewritten asp(a,b,c)=p(c|a,b)p(b|a)p(a)32Making a Graphical Model• Introduce one node per random variable• Use the distributionp(c|a,b)p(a,b)• Add one edge per conditional distribution4Bayesian Networks• Directed Acyclic Graph (DAG)5Bayesian NetworksGeneral Factorization63Ancestral Sampling• Start at the low numbered nodes and draw samples• Work your way through the graph, sampling from conditional distributions7Joint Probability Distribution8Generative Models• The graph, along with the ancestral sampling method, is a model of how the data is generated• The model expresses a causal process for generating the data• These models are often called generative models– They show how to generate the data94Generative Models• Causal process for generating images10Discrete Variables (1)• General joint distribution: K2 -1 parameters• Independent joint distribution: 2(K-1)parameters11Discrete Variables (2)General joint distribution over M variables: KM-1 parametersM -node Markov chain: K-1 + (M-1)K (K-1)parameters125Discrete Variables: Bayesian Parameters (1)13Conditional Independence•ais independent of b given c• Equivalently• Notation14Conditional Independence: Example 1156Conditional Independence: Example 116Conditional Independence: Example 217Conditional Independence: Example 2187Conditional Independence: Example 3• Note: this is the opposite of Example 1, with c unobserved.19Conditional Independence: Example 3Note: this is the opposite of Example 1, with c observed.20“Am I out of fuel?”B = Battery (0=flat, 1=fully charged)F = Fuel Tank (0=empty, 1=full)G = Fuel Gauge Reading(0=empty, 1=full)and hence218“Am I out of fuel?”Probability of an empty tank increased by observing G = 0. 22“Am I out of fuel?”Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.23D-separation•A, B, and C are non-intersecting subsets of nodes in a directed graph.•A path from A to B is blocked if it contains a node such that eithera) the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C , orb) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C .•If all paths from A to B are blocked, A is said to be d-separated from B by C . •If A is d-separated from B by C , the joint distribution over all variables in the graph satisfies .249D-separation: Example25The Markov BlanketFactors independent of xicancel between numerator and denominator.26Markov Random FieldsMarkov Blanket2710Cliques and Maximal Cliques• To understand undirected models, we need to introduce the notion of a clique– Subset of nodes– Links between all nodes in subset• And maximal clique– If you add nodes to the clique, it is no longer a cliqueCliqueMaximal Clique28Joint Distribution• where is the potential over clique C and • is the normalization coefficient; note: MK-state variables  KMterms in Z29Converting Directed to Undirected Graphs (1)3011Converting Directed to Undirected Graphs (2)• Additional links31Directed vs. Undirected Graphs (2)32Comparing Directed andUndirected Graphical Models• Specifying an undirected graphical model is easy (normalized product of potentials), but the factors don’t have probabilistic interpretations• Specifying a directed graphical model is harder (we must choose an ordering of the variables), but the factors are marginal and conditional densities• Determining independence in undirected models is easy (graph separation) and in directed models it is hard (d-separation)33M. Paskin12Moralization• We can transform a Bayesian network into a Markov random field by placing a clique over each family of the Bayesian network and dropping the directed edges• This process is called moralization because we marry (or connect) the variable’s parents and then drop the edge directions34M. PaskinInference35Inference in Graphical Models• Let's say that I have analyzed the problem• Have designed a graphical model that relates the observations to quantities that I would like to estimate• My observations are available• How do estimate the hidden, or latent, quantities?3613Inference and Conditional Independence• Conditional independence is important for MRF models because it makes inference much easier• Consider a chain• Task: Find the marginal distribution of some variable xn37Naïve, Slow Way• Use the sum rule and compute the sums• Implication:– If you have K states per node and N nodes, this will take KNoperations38Taking Advantage of the Structure• The distribution of this chain:• Is:3914Four-Node Example• Start re-arranging sums40Re-arranging Sums• Make Substitution• Which leads to41Re-arranging Sums• Do it again• Obtain4215Inference on a Chain43Inference on a Chain44Inference on a Chain4516Inference on a Chain46Inference on a Chain• To compute local marginals:•Compute and store all forward messages •Compute and store all backward messages •Compute Z at any node xm•Computefor all variables required47TreesUndirected Tree Directed Tree Polytree4817Factor Graphs• One more type of graphical model• We can write the distribution as:49Factor Graphs50Factor Graphs from Directed Graphs5118Factor Graphs from Undirected Graphs52Markov Random Fields• Resources– Y. Boykov, P. Torr and R. Zabih tutorial at ECCV 2004– Numerous others exist, but require a lot of background knowledge53Markov Random Fields• Undirected graphs that encode point-wise and neighborhood relationships• Inference is NP-hard due to loops– If only two labels are possible, exact solution is equivalent to max flowormin cut• Have been used in a wide range of problems:segmentation stereo matchingoptical flow


View Full Document

STEVENS CS 559 - CS 559 11th Set of Notes

Download CS 559 11th Set of Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 559 11th Set of Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 559 11th Set of Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?