New version page

# UD CIS 841 - Markov Chains and Mixture Models

Unformatted text preview:

Also appears in the Online Symposium for Electronics Engineer 2000 http://www.techonline.com/osee/ Hidden Markov Models: Fundamentals and Applications Part 1: Markov Chains and Mixture Models Valery A. Petrushin [email protected] Center for Strategic Technology Research Accenture 3773 Willow Rd. Northbrook, Illinois 60062, USA. Abstract The objective of this tutorial is to introduce basic concepts of a Hidden Markov Model (HMM) as a fusion of more simple models such as a Markov chain and a Gaussian mixture model. The tutorial is intended for the practicing engineer, biologist, linguist or programmer who would like to learn more about the above mentioned fascinating mathematical models and include them into one’s repertoire. This lecture presents Markov Chains and Gaussian mixture models, which constitute the preliminary knowledge for understanding Hidden Markov Models. Introduction Why it is so important to learn about these models? First, the models have proved to be indispensable for a wide range of applications in such areas as signal processing, bioinformatics, image processing, linguistics, and others, which deal with sequences or mixtures of components. Second, the key algorithm used for estimating the models – the so-called Expectation Maximization Algorithm -- has much broader application potential and deserves to be known to every practicing engineer or scientist. And last, but not least, the beauty of the models and algorithms makes it worthwhile to devote some time and efforts to learning and enjoying them. 1 Introduction into Markov chains 1.1 Paving the path to the shrine problem Welcome to the planet Median! Median is the capital planet of the Stochastic Empire, which conquered all worlds of the observable part of the Universe and spread its influence to the hidden parts. Ruled by the Glorious and Fair ruler, Emperor Probabil the Great, the Empire reached the most flourishing state of commerce, science and industry ever. Welcome to Bayes-City, the capital of the Empire! Today, the Emperor’s heralds had the honor of announcing a new giant project. The four major Provinces of the Empire have to pave the paths from their residence cities on Median to the St. Likely Hood Shrine, located on the Mean Square in Bayes-City. Each path is 1,000 miles long and has to be paved using large tiles of the following precious stones: ruby (red), emerald (green) and topaz (blue) in accordance to its Figure 1.1. Paving problemProvince’s random process which states that the color of next tile depends only on the color of the current tile. The square around the St. Likely Hood Shrine will be secretly divided into sectors and paved by the teams from all four Provinces over one night (Figure 1.1). A pilgrim will be allowed to enter the Shrine to enjoy the view of the Bayesian Belief Network, the log that served as a chair to Likely Hood, and the other relics, and also to talk to the high priests of science during the high confidence time intervals if and only if he or she can determine for three sectors of pavement around the Shrine which sector has been paved by the team of which Province. Imagine you are a pilgrim who journeyed more than 1,000 miles on foot to see the Stochastic Empire’s relics. How can you solve the problem? 1.2 Markov chains We deal with a stochastic or random process which is characterized by the rule that only the current state of the process can influence the choice of the next state. It means the process has no memory of its previous states. Such a process is called a Markov process after the name of a prominent Russian mathematician Andrey Markov (1856-1922). If we assume that the process has only a finite or countable set of states, then it is called a Markov chain. Markov chains can be considered both in discrete and continuous time, but we shall limit our tutorial to the discrete time finite Markov chains. Such chains can be described by diagrams (Figure 1.2). The nodes of the diagram represent the states (in our case, a state corresponds to a choice of a tile of a particular color) and the edges represent transitions between the states. A transition probability is assigned to each edge. The probabilities of all edges outgoing from a node must sum to one. Beside that, there is an initial state probability distribution to define the first state of the chain. Thus, a Markov chain is uniquely defined by a pair (1.1), where π is the vector of initial probability distribution and A is a stochastic transition matrix. The Markov process characteristic property is represented by (1.2). Figure 1.2 presents Markov chain models for a biased coin and tile generation. 1.3 Training algorithm We assume that each Province uses its own Markov chain to generate sequences of tiles. We also assume that all tiles have their ordinal numbers carved on them and we have no problem determining the order of tiles in the sequence (in spite of the fact that the tiles cover the two-dimensional surface of the path). The resulting paths could be visually different for pavements of different Provinces. For example, North Province, which is famous for its deposits of emerald, might have larger probabilities for transitions to emerald (green) state and generate greenish paths. But the differences among models could be rather small and the paths produced by the models could look very similar, especially for the paths of short length. That is why we have to find a formal approach for estimating the model's parameters based on =7.03.05.05.0A()5.05.0=π=2.05.03.03.04.03.025.05.025.0A()2.06.02.0=π(a) (b) Figure 1.2. Markov chain models for a biased coin (a), and the paving model MN AM ,π=()NjiijNaA,1,21}{,...,,===ππππ()()kkkkqqPqqqP |,...,|111 ++=(1.2) (1.1)data. Thanks to the Emperor, we have enough data to train our models -- 1,000 miles of data per model! If a tile has the diameter of one yard and the tiles are lined up in rows of 20 tiles per row then we have 35,200 tiles per mile. To estimate parameters of a model with 3 states, we need to calculate 12 values: a 3-element vector of initial state probabilities and a 3 by 3 state transition probability matrix. Let LqqqQ ...21=be a training sequence of states. As estimations for the initial state probabilities, we use frequencies of tiles ci in the training sequence (1.3). To estimate transition probabilities, we have to count the number of pairs of tiles for each combination

View Full Document Unlocking...