Berkeley STATISTICS 246 - HMM in crosses and small pedigrees - D1822673

Home> Schools> University of California, Berkeley> (STATISTICS) > STATISTICS 246> HMM in crosses and small pedigrees

DOC PREVIEW

Berkeley STATISTICS 246 - HMM in crosses and small pedigrees

School name University of California, Berkeley

Course Statistics 246- Statistical Genetics

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

HMM in crosses and small pedigreesDiscrete-time Markov chainsHidden Markov Models (HMM)HMM, cont.HMM in experimental crossesHMM in experimental crosses, contF2 HMM emission probabilitiesCalculations with our HMMAnother problem: reconstructing haplotypesSome basic notions from pedigree analysisInheritance vectorsExercises and references1HMM in crosses and small pedigreesLecture 8, Statistics 246, February 17, 20042Discrete-time Markov chains Consider a sequence of random variables X1,X2, X3, …with common finite state space S. This sequence forms a Markov chain if for all t, {X1 ,…, Xt-1} and {Xt+1 , Xt+2 ,… } are conditionally independent given Xt , equivalently, pr(Xt | Xt-1, Xt-2 ,…) = pr(Xt | Xt-1). The matrix p(i,j;t) = pr(Xt = j | Xt-1 = i) is the transition matrix at step t. When p(i,j;t) = p(i,j) for all i and j, independent of t, we say the Markov chain is time-homogeneous, or has stationary transition probabilities. Many of the chains we’ll be meeting will be inhomogeneous, and t will be in space, not time. There are plenty of good books on elementary Markov chain theory, Feller vol 1 being my favourite, but they mostly concentrate on asymptotic behaviour in the homogeneous case. For the time being we don’t need this, or much else from the general theory, apart from the fact that multi-step transition matrices are products of 1-step transition matrices (Exercise).3Hidden Markov Models (HMM) If (Xt) is a Markov chain, and f is an arbitrary function on the state space, then (f(Xt)) will not in general be a Markov chain. Exercise: Construct an example to demonstrate the last assertion. It is sometimes the case that associated with a Markov chain (Xt) is another process, (Yt) say, whose terms are conditionally independent given the chain (Xt). This happens with so-called semi-Markov chains. Both functions of Markov chains and this last situation are covered by the following useful definition, based on the work of L. E. Baum and colleagues around 1970. A bivariate Markov chain (Xt,Yt) is called a Hidden Markov Model if a) (Xt) is a Markov chain, and b) the distribution of Yt given Xt , Xt-1 , Xt-2 …. depends only on Xt and Xt-1 . In many examples, this dependence is only on Xt , but in some, it can extend beyond Xt-1 , and/or includeYt . Once you see how the defining property is used in the calculations, you will get an idea of the possible extensions. Exercise. Explain how functions of Markov chains are always HMM.4HMM, cont. There are many suitable references on HMM, but two good ones for our purposes are the books by Timo Koski (HMM for bioinformatics, 2001) and Durbin et al (Biological sequence analysis, 1998). The simplest specification of an HMM is via the transition probabilities p(i,j;t) for the underlying Markov chain (Xt ), and the emission probabilities for the observations (Yt ), where these are given by q(i,j, k;t) = pr(Yt = k | Xt-1 = i, Xt = j). We also need an initial distribution  for the chain: (i) = pr(X0 = i). In general we are not going to observe (Xt ), which accounts for the word “hidden” in the name, but if we did, the probability of observing the state sequence x0 ,x1, x2, ….., xn, and associated observations y1, y2, …,yn is (x0)p(x0, x1;1)q(x0, x1 ,y1;1)…..p(xn-1,xn ;n )q(xn-1,xn,yn;n).5HMM in experimental crosses We are going to consider the chromosomes of offspring from crosses of inbred strains A and B of mice. Suppose that we have n markers along a chromosome, #1 say, in their correct order 1, 2, …n, say, with rt being the recombination fraction between markers t and t+1. Consider the genotypes at these markers along chromosome 1 of an A  H backcross mouse. Each genotype will be either A = aa or H = ab, and you can do the Exercise. Under the assumption of no interference, the sequence of genotypes at markers 1, 2, …, n is a Markov chain with state space {A, H}, initial distribution (1/2,1/2 ), and 22 transition probability matrix R(rt) having diagonal entries1-rt for no change (AA, HH), and off-diagonal entries rt for change (AH, HA). This Markov chain represents the crossover process along the chromosome passed by one parent, say the mother. In an F2 intercross, H  H, there is a crossover process in both F1 parents. If we consider the offspring’s possible ordered genotypes, that is, genotypes with known parental origin (also called known phase), then the sequence of ordered genotypes at markers 1, 2, …, n is also a Markov chain, with state space, {a,b} {a,b} = {aa, ab, ba, bb}, initial distribution (1/2, 1/2)  (1/2, 1/2) = (1/4,1/4, 1/4. 1/4), and transition probability matrixP(t) = R(rt)R(rt).6HMM in experimental crosses, cont between marker t and t+1. Here you need to know the notion of tensor product of matrices, also known as direct product or Kronecker product. This is defined in most books on matrices, but my favourite is Bellman’s book Exercise. My notation suggests the idea of the product of two independent Markov chains. Define this notion carefully, and show we get a Markov chain. . Now, unlike in the backcross, the observed F2 genotypes do not always tell us which parental strand had a recombination across an interval, and which didn’t, so we cannot always reconstruct the ordered genotypes. With this 4-state Markov chain we have just 3 possible observed states, and we include some ambiguity states, giving us an observation space {A, H, B, C, D, -}, where C = {H,B} = not A, D = {A, H} = not B, and - = missing. The resulting joint chain-observation process is an HMM, with emission probabilities which in a more general form can be written as the array on the next page. There  is the error rate, which may in fact be marker specific, though for simplicity that is not indicated by the notation.7F2 HMM emission probabilities€ YtXtA H B C D −aa 1− ε12ε12ε ε 1−12ε 1ab12ε 1− ε12ε 1−12ε 1−12ε 1ba12ε 1− ε12ε 1−12ε 1−12ε 1bb12ε12ε 1− ε 1−12ε ε 1Note that the row entries in this array do not simply sum to 1, but the entries for mutually exclusive and exhaustive cases should: A, H and B, or A and C, etc.8Calculations with our HMM For our F2 intercross there are certain calculations we would like to do which the HMM formalism makes straightforward. In fact, they are all instances of calculations generally of interest

View Full Document