Readings K F 18 1 18 2 18 3 18 4 Dynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University December 1st 2006 1 Dynamic Bayesian network DBN HMM defined by Transition model P X t 1 X t Observation model P O t X t Starting state distribution P X 0 DBN Use Bayes net to represent each of these compactly Starting state distribution P X 0 is a BN silly e g performance in grad school DBN Vars Happiness Productivity HiraBlility Fame Observations PapeR Schmooze 2 1 Unrolled DBN Start with P X 0 For each time step add vars as defined by 2 TBN 3 Sparse DBN and fast inference Sparse DBN Time t t 1 A A Fast inference t 2 t 3 A A B B B B C C C C D D D D E E E E F F F F 4 2 Even after one time step What happens when we marginalize out time t Time t t 1 A A B B C C D D E E F F 5 Sparse DBN and fast inference 2 Structured representation of belief often yields good approximate Almost Sparse DBN Time t t 1 A A Fast inference t 2 t 3 A A B B B B C C C C D D D D E E E E F F F F 6 3 BK Algorithm for approximate DBN inference Boyen Koller 98 Assumed density filtering Choose a factored representation P for the belief state Every time step belief not representable with P project into representation Time t t 1 A A t 2 t 3 A A B B B B C C C C D D D D E E E E F F F F 7 A simple example of BK FullyFactorized Distribution Assumed density Fully factorized True P X t 1 Time t t 1 A A B B C C D D E E F F Assumed Density t 1 for P X 8 4 Computing Fully Factorized Distribution at time t 1 Assumed density Time t A Fully factorized Assumed Density t 1 for P X t 1 Computing t 1 for P X i A B B C C D D E E F F 9 General case for BK Junction Tree Represents Distribution Assumed density Fully factorized True P X t 1 Time t t 1 A A B B C C D D E E F F Assumed Density t 1 for P X 10 5 Computing factored belief state in the next time step Introduce observations in current time step A Use J tree to calibrate time t beliefs B B C C D D Compute t 1 belief project into approximate belief state marginalize into desired factors corresponds to KL projection Equivalent to computing marginals over factors directly A E E F F For each factor in t 1 step belief Use variable elimination 11 Error accumulation Each time step projection introduces error Will error add up causing unbounded approximation error as t 12 6 Contraction in Markov process 13 BK Theorem Error does not grow unboundedly Theorem If Markov chain contracts at a rate of usually very small and assumed density projection at each time step has error bounded by usually large then the expected error at every iteration is bounded by 14 7 Example BAT network Forbes et al 15 BK results Boyen Koller 98 16 8 Thin Junction Tree Filters Paskin 03 BK assumes fixed approximation clusters TJTF adapts clusters over time attempt to minimize projection error 17 Hybrid DBN many continuous and discrete variables DBN with large number of discrete and continuous variables of mixture of Gaussian components blows up in one time step Need many smart tricks e g see Lerner Thesis Reverse Water Gas Shift System RWGS Lerner et al 02 18 9 DBN summary DBNs factored representation of HMMs Kalman filters sparse representation does not lead to efficient inference Assumed density filtering BK factored belief state representation is assumed density Contraction guarantees that error does blow up but could still be large Thin junction tree filter adapts assumed density over time Extensions for hybrid DBNs 19 This semester Bayesian networks Markov networks factor graphs decomposable models junction trees parameter learning structure learning semantics exact inference variable elimination context specific independence approximate inference sampling importance sampling MCMC Gibbs variational inference loopy belief propagation generalized belief propagation Kikuchi Bayesian learning missing data EM Chow Liu IPF GIS Gaussian and hybrid models discrete and continuous variables temporal and template models Kalman filter linearization switching Kalman filter assumed density filtering DBNs BK Causality Just the beginning 20 10 Quick overview of some hot topics Conditional Random Fields Maximum Margin Markov Networks Relational Probabilistic Models e g the parameter sharing model that you learned for a recommender system in HW1 Hierarchical Bayesian Models e g Khalid s presentation on Dirichlet Processes Influence Diagrams 21 Generative v Discriminative models Intuition Want to Learn h X a Y X features Y set of variables Generative classifier e g Na ve Bayes Markov networks Assume some functional form for P X Y P Y Estimate parameters of P X Y P Y directly from training data Use Bayes rule to calculate P Y X x This is a generative model Indirect computation of P Y X through Bayes rule But can generate a sample of the data P X y P y P X y Discriminative classifiers e g Logistic Regression Conditional Random Fields Assume some functional form for P Y X Estimate parameters of P Y X directly from training data This is the discriminative model Directly learn P Y X can have lower sample complexity But cannot obtain a sample of the data because P X is not available 22 11 Conditional Random Fields Lafferty et al 01 Define a Markov network using a log linear model for P Y X Features e g for pairwise CRF Learning maximize conditional log likelihood sum of log likelihoods you know and love learning algorithm based on gradient descent very similar to learning MNs 23 Max Conditional Likelihood D x1 t x1 m x t xm Estimation Classification f x y Don t need to learn entire distribution 24 12 OCR Example We want argmaxword wT f word brace Equivalently wT f wT f wT f brace wT f brace wT f aaaaa aaaab brace wT f zzzzz a lot 25 Max Margin Estimation Goal find w such that x D A A wTf x t x wTf x y wT f x t x f x y 0 y t x w fx y 0 tx y Maximize margin Gain over y grows with of mistakes in y tx y t w f craze 2 craze 2 t w f zzzzz 5 zzzzz 5 26 13 M3Ns Maximum Margin Markov Networks Taskar et al 03 D x1 t x1 xm t xm Estimation Classification f x y 27 Propositional Models and Generalization Suppose you learn a model for social networks for CMU from FaceBook data to predict movie preferences How would you apply when new people …
View Full Document