Berkeley STATISTICS 246 - HMM in crosses and small pedigrees - D1822673

Home> Schools> University of California, Berkeley> (STATISTICS) > STATISTICS 246> HMM in crosses and small pedigrees

DOC PREVIEW

Berkeley STATISTICS 246 - HMM in crosses and small pedigrees

School name University of California, Berkeley

Course Statistics 246- Statistical Genetics

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

HMM in crosses and small pedigrees Lecture 8 Statistics 246 February 17 2004 1 Discrete time Markov chains Consider a sequence of random variables X1 X2 X3 with common finite state space S This sequence forms a Markov chain if for all t X1 Xt 1 and Xt 1 Xt 2 are conditionally independent given Xt equivalently pr Xt Xt 1 Xt 2 pr Xt Xt 1 The matrix p i j t pr Xt j Xt 1 i is the transition matrix at step t When p i j t p i j for all i and j independent of t we say the Markov chain is time homogeneous or has stationary transition probabilities Many of the chains we ll be meeting will be inhomogeneous and t will be in space not time There are plenty of good books on elementary Markov chain theory Feller vol 1 being my favourite but they mostly concentrate on asymptotic behaviour in the homogeneous case For the time being we don t need this or much else from the general theory apart from the fact that multi step transition matrices are products of 1 step transition matrices Exercise 2 Hidden Markov Models HMM If Xt is a Markov chain and f is an arbitrary function on the state space then f Xt will not in general be a Markov chain Exercise Construct an example to demonstrate the last assertion It is sometimes the case that associated with a Markov chain Xt is another process Yt say whose terms are conditionally independent given the chain Xt This happens with so called semi Markov chains Both functions of Markov chains and this last situation are covered by the following useful definition based on the work of L E Baum and colleagues around 1970 A bivariate Markov chain Xt Yt is called a Hidden Markov Model if a Xt is a Markov chain and b the distribution of Yt given Xt Xt 1 Xt 2 depends only on Xt and Xt 1 In many examples this dependence is only on Xt but in some it can extend beyond Xt 1 and or includeYt Once you see how the defining property is used in the calculations you will get an idea of the possible extensions Exercise Explain how functions of Markov chains are always HMM 3 HMM cont There are many suitable references on HMM but two good ones for our purposes are the books by Timo Koski HMM for bioinformatics 2001 and Durbin et al Biological sequence analysis 1998 The simplest specification of an HMM is via the transition probabilities p i j t for the underlying Markov chain Xt and the emission probabilities for the observations Yt where these are given by q i j k t pr Yt k Xt 1 i Xt j We also need an initial distribution for the chain i pr X0 i In general we are not going to observe Xt which accounts for the word hidden in the name but if we did the probability of observing the state sequence x0 x1 x2 xn and associated observations y1 y2 yn is x0 p x0 x1 1 q x0 x1 y1 1 p xn 1 xn n q xn 1 xn yn n 4 HMM in experimental crosses We are going to consider the chromosomes of offspring from crosses of inbred strains A and B of mice Suppose that we have n markers along a chromosome 1 say in their correct order 1 2 n say with rt being the recombination fraction between markers t and t 1 Consider the genotypes at these markers along chromosome 1 of an A H backcross mouse Each genotype will be either A aa or H ab and you can do the Exercise Under the assumption of no interference the sequence of genotypes at markers 1 2 n is a Markov chain with state space A H initial distribution 1 2 1 2 and 2 2 transition probability matrix R rt having diagonal entries1 rt for no change A A H H and off diagonal entries rt for change A H H A This Markov chain represents the crossover process along the chromosome passed by one parent say the mother In an F2 intercross H H there is a crossover process in both F1 parents If we consider the offspring s possible ordered genotypes that is genotypes with known parental origin also called known phase then the sequence of ordered genotypes at markers 1 2 n is also a Markov chain with state space a b a b aa ab ba bb initial distribution 1 2 1 2 1 2 1 2 1 4 1 4 1 4 1 4 and transition probability matrixP t R rt R rt 5 HMM in experimental crosses cont between marker t and t 1 Here you need to know the notion of tensor product of matrices also known as direct product or Kronecker product This is defined in most books on matrices but my favourite is Bellman s book Exercise My notation suggests the idea of the product of two independent Markov chains Define this notion carefully and show we get a Markov chain Now unlike in the backcross the observed F2 genotypes do not always tell us which parental strand had a recombination across an interval and which didn t so we cannot always reconstruct the ordered genotypes With this 4 state Markov chain we have just 3 possible observed states and we include some ambiguity states giving us an observation space A H B C D where C H B not A D A H not B and missing The resulting joint chain observation process is an HMM with emission probabilities which in a more general form can be written as the array on the next page There is the error rate which may in fact be marker specific 6 though for simplicity that is not indicated by the notation F2 HMM emission probabilities Xt aa ab ba bb A H Yt B 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 C D 1 1 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1 2 1 1 1 1 Note that the row entries in this array do not simply sum to 1 but the entries for mutually exclusive and exhaustive cases should A H and B or A and C 7 etc Calculations with our HMM For our F2 intercross there are certain calculations we would like to do which the HMM formalism makes straightforward In fact they are all instances of calculations generally of interest with HMM and if the number of steps is not too large and the state spaces not too big there are neat algorithms for carrying them out which are worth knowing Major problems Given the combined observations O of the genotypes Y Y1 Y2 Yn from many different F2 offspring transition matrices P rt and emission probabilities q t t 1 2 n a calculate the log likelihood l r O log pr O r for different parameter values …

View Full Document