Graphical ModelsML 701Anna GoldenbergOutline!Dynamic Models!Gaussian Linear Models!Kalman Filter!DBN!Undirected Models!Unification!SummaryHMMsqt hidden states Ot observationsq0Ooq1O1qTOT. . .P (Q, O) = p(q0)T −1!t=1p(qt+1|qt)T!t=1p(Ot|qt)!is a Bayes Net!satisfies Markov property (independence of states given present) !with discrete states (time steps are discrete)HMM in shortWhat about continuous HMMs?Gaussian LinearState Space models!!!What about continuous HMMs?Example of useSLAM - Simultaneous Localization and MappingDrawback: Belief State and Time grow quadratically in the number of landmarks http://www.stanford.edu/~paskin/slam/State Space Modelsqt hidden states Ot observationsState Space Modelsqt hidden states Ot observationsq0Ooq1O1qTOT. . .P (Q, O) = p(q0)T −1!t=1p(qt+1|qt)T!t=1p(Ot|qt)State Space Modelsqt hidden states Ot observationsqt - is a real-valued K-dimensional hidden state variableOt - is a D-dimensional real-valued observation vectorq0Ooq1O1qTOT. . .State Space Modelsqt hidden states Ot observationsqt= f(qt−1) + wtf determines mean of qt given mean of qt-1wt is zero-mean random noise vectorOt= g(qt) + vtsimilarlyq0Ooq1O1qTOT. . .BA A AB BGaussian Linear State Space Models!Ot and qt are Gaussian!f and g are linear and time-invariantA - transition matrixB - observation matrixqt= Aqt−1+ wtwt∼ N (0, R)Ot= Bqt−1+ vtvt∼ N (0, S),,q0∼ N (0, Σ0)correction: previously R and S were reversedInference!forward step (filtering)!backward step (smoothing)p(qt|Ot, Ot+1, . . . , OT)Kalman Filterp(qt|O0, . . . , Ot)Kalman Filter (1960)!time update!measurement updateP (qt−1|O0, . . . , Ot−1) → P (qt|O0, . . . , Ot−1)E(qt|t−1) = A · E(qt−1|t−1)V (qt|t−1) = A · V (qt−1|t−1)AT+ RP (qt|Oo, . . . , Ot−1) → P (qt|Oo, . . . , Ot)P (qt, Ot|Oo, . . . , Ot−1)!E(qt|t−1)B · E(qt|t−1)"1.!V (qt|t−1) V (qt|t−1)BTBV (qt|t−1) BV (qt|t−1)BT+ R"2.P (qt|Oo, . . . , Ot−1) → P (qt|Oo, . . . , Ot)Σ11Σ21Σ12Σ22E(qt|t) = E(qt|t−1) + Σ12Σ−122(Ot− E(Ot|t))V (qt|t) = V (qt|t−1) − Σ12Σ−122Σ21q1-1Ot-1qtOtq1-1Ot-1qtOtq1-1Ot-1qtOtExample of useReported by Welch and Bishop, SIGGRAPH 2001Kalman Filter Usage!Tracking motion!Missiles!Hand motion!Lip motion from videos!Signal Processing!Navigation!Economics (for prediction)Dynamic Bayes Nets!So far!But are there more appealing models?q0Ooq1O1qTOT. . .(Koller and Friedman)Weather 0Velocity 0Location 0Failure 0Obs_0Weather 1Velocity 1Location 1Failure 1Obs_1Weather 2Velocity 2Location 2Failure 2Obs_2Dynamic Bayes Nets!It’s just a Bayes Net!!Approach to the dynamics!1. Start with some prior for the initial state!2. Predict the next state just using the observation up to the previous time step!3. Incorporate the new observation and re-estimate the current stateWeather 0Velocity 0Location 0Failure 0Obs_0Weather 1Velocity 1Location 1Failure 1Obs_1Weather 2Velocity 2Location 2Failure 2Obs_2Dynamic Bayes Nets!It’s just a Bayes Net!!Approach to the dynamics!1. Start with some prior for the initial state!2. Predict the next state just using the observation up to the previous time step!3. Incorporate the new observation and re-estimate the current stateWeather 0Velocity 0Location 0Failure 0Obs_0Weather 1Velocity 1Location 1Failure 1Obs_1Weather 2Velocity 2Location 2Failure 2Obs_2Most importantly:Use the structure of the Bayes Net. Use the independencies!!!Other graphical modelsbut first...Any questions so far?Are all GM directed?There are Undirected Graphical Models!B CEDAUndirected modelsp(X) =1Z!Cψ(XC)B CEDAψ(XC)- non-negative potential functionWhat are C ?Cliquesp(X) =1Z!Cψ(XC)B CEDAA clique C is a subset CV if i,jC, (i,j)EC is maximal if it is not contained in any other clique ψ(XC)- non-negative potential functionCliquesB CEDAi) B - a clique?ii) BC - a maximal clique?iii) ABCD - a clique?iv) ABC - a maximal clique?v) BCDE - a clique?DecompositionB CEDANote to resolve the confusion:The most common machine learning notation is the decomposition over maximal cliquesp(A, B, C, D, E) =1Zp(A, B, C)p(B, D)p(C, E)p(D, E)IndependenceB CEDA Rule: V1 is independent of V2 given cutset S S is called the Markov Blanket (MB)e.g. MB(B) = {A,C,D}, i.e. the set of neighborsAre undirected models useful?!Yes!!Used a lot in Physics (Ising model, Boltzmann machine)!In vision (every pixel is a node)!BioinformaticsAre undirected models useful?!Yes!!Used a lot in Physics (Ising model, Boltzmann machine)!In vision (every pixel is a node), bioinformatics!Why not more popular?!the ZZZZZZ! it’s the partition functionp(X) =1Z!Cψ(XC)What’s Z and ways to fight it!Approximations !Sampling (MCMC sampling is common)!Pseudo-Likelihood!Mean-field approximationZ =!∀x"Cψ(XC)Chain Graphs!Generalization of MRFs and Bayes Nets!Structured as blocks!Undirected edges within a block!Directed edges between blocksChain Graphs!Generalization of MRFs and Bayes Nets!Structured as blocks!Undirected edges within a block!Directed edges between blocksquite intractablenot very popularused in BioMedical Engineering (text)Graphical ModelsChain GraphsUndirectedDirected?A BCDirected Undirected?A BCDirected?Undirected?B CDAB CDAChain GraphsUndirectedDirected!Graphical Models is a huge evolving field!There are many other variations that haven’t been discussed!Used extensively in variety of domains!Tractability issues!More work to be
View Full Document