11School of Computer ScienceTowards Complex Graphical Models and Approximate InferenceProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 15, Nov 5, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: posted papersEric Xing 2z Complex dynamic systems:z Non-linearityz Non-Gaussianityz Multi-modalityz ...z Limitation of LDS z defines only linearity evolving, unimodal, and Gaussian belief statesz A Kalman filter will predict the location of the bird using a single Gaussian centered on the obstacle. z A more realistic model allows for the bird’s evasive action, predicting that it will fly to one side or the other.The need for complex dynamic models)xˆC(yˆˆt|1t1t|| ++++++−+=1111 tttttKxxtttttt||| 1111 ++++−=KCPPP2Eric Xing 3Representing complex dynamic processesz The problem with HMMsz Suppose we want to track the state (e.g., the position) of D objects in an image sequence.z Let each object be in K possible states.z Then Xt= (Xt(1), … , Xt(D)) can have KDpossible values.⇒ Inference takes time and space.⇒ P(Xt|Xt-1) need parameters to specify.Eric Xing 4Dynamic Bayesian Networkz A DBN represents the state of the world at time t using a set of random variables, Xt(1), … , Xt(D)(factored/ distributed representation).z A DBN represents P(Xt|Xt-1) in a compact way using a parameterized graph.⇒ A DBN may have exponentially fewer parameters than its corresponding HMM.⇒ Inference in a DBN may be exponentially faster than in the corresponding HMM.3Eric Xing 5DBNs are a kind of graphical modelz In a graphical model, nodes represent random variables, and (lack of) arcs represents conditional independencies.z DBNs are Bayes nets for dynamic processes.z Informally, an arc from Xtito Xt+1jmeans Xi"causes" Xj. z Can "resolve" cycles in a "static" BNEric Xing 6A road map to complex dynamic modelsAXYAXYAXYdiscretediscretediscretecontinuouscontinuouscontinuousMixture modele.g., mixture of multinomialsMixture modele.g., mixture of GaussiansFactor analysisA AA Ax2x3x1xNy2y3y1yN... ... A AA Ax2x3x1xNy2y3y1yN... ... A AA Ax2x3x1xNy2y3y1yN... ... A AA Ax2x3x1xNy2y3y1yN... ... A AA Ax2x3x1xNy2y3y1yN... ... A AA Ax2x3x1xNy2y3y1yN... ... HMM (for discrete sequential data, e.g., text)HMM(for continuous sequential data, e.g., speech signal)State space model............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... ............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... ............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... ............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... Factorial HMM Switching SSM4Eric Xing 7HMM variants represented as DBNsz The same code (standard forward-backward, viterbi, and Baum-Welsh) can do inference and learning in all of these models.Eric Xing 8Factorial HMMz The belief state at each time isand in the most general case has a state space O(dk) for kd-nary chainsz The common observed child Ytcouples all the parents (explaining away).z But the parameterization cost for fHMM is O(kd2) for kchain-specific transition models , rather than O(d2k) for {})()(,,ktttQQX K1=)|()()(ititQQp1−)|(1−ttXXp5Eric Xing 9Factorial HMMs vs HMMsz Let us compare a factorial HMM with D chains, each with K values, to its equivalent HMM.z Num. parameters to specifyz HMM:z fHMM:z Computational complexity of exact inference:z HMMz fHMM:)|(1−ttXXpEric Xing 10Triangulating fHMMz Is the following triangulation correct?z Here is a triangulationz We have created cliques of size k+1, and there are O(kT) of them. The junction tree algorithm is not efficient for factorial HMMs.............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... ............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... ⇒⇒??6Eric Xing 11Special case: switching HMM............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... ............A AA Ax2x3x1xNyk2yk3yk1ykN... ... y12y13y11y1N... S2S3S1SN... The clipperPerson 1Person kThe scenez Different chains have different state space and different semantics z The exact calculation is intractable and we must use approximate inference methodsEric Xing 12Hidden Markov decision treesz A combination of decision trees with factorial HMMsz This gives a "command structure" to the factorial representationz Appropriate for multi-resolution time series z Again, the exact calculation is intractable and we must use approximate inference methods(Jordan,Ghahramani,&Saul,197)7Eric Xing 13Recall State Space Models (SSMs)z Also known as linear dynamical system, dynamic linear model, Kalman filter model, etc.z The Kalman lter can compute in O(min{M3;D2}) operations per time step.)|(:ttYXP1Eric Xing 14Factored linear-Gaussian models produce sparse matricesz Directed arc from Xt-1ito Xtjiff A(i,j) >0(undirected arc between Xtito Xtjiff Σ−1(i,j) >0z e.g., consider a 2-chain factorial SSM with),;()|(iitiitititQXAXXXP11 −−=N=−−),|,(211121ttttXXXXP8Eric Xing 15Discrete-state modelsz Factored discrete-state models do NOT produce sparsetransition matricesz e.g., consider a 2-chain factorial HMM)|()|(),|,(212111211121−−−−=ttttttttXXPXXPXXXXPEric Xing 16Problems with SSMsz linearityz Gaussianityz Uni-modality9Eric Xing 17Switching SMMz Possible world: z multiple motion state:z Task: z Trajectory predictionz Model:z Combination of HMM and SSMz Belief state has O(kt) Gaussian modes:),()|(),;()|(),;(),|(jiMiSjSpxtxXyYpxxiSxXxXpttttttttititttttt==========−−−−1111RCQANNEric Xing 18Data association (correspondence problem)z Optimal belief state has O(kt) modes.z Common to use nearest neighbor approximation.z For each time slice, can enforce that at most one source causes each observationz Correspondence problem also arises in shape matching and stereo vision.10Eric Xing 19Kinds of inference for DBNsEric Xing 20Complexity of inference in DBNz Even with local connectivity, everything becomes correlated due to shared common influences in the past.z E.g. coupled HMM (cHMM)z Even though CHMMs are sparse, all nodes eventually become correlated, so P(Xt|y1:t) has size O(2N).11Eric Xing 21Junction tree for coupled HMMsz Cliques form a frontier that snakes from Xt−1to Xt.Eric Xing
View Full Document