DOC PREVIEW
Berkeley STATISTICS 246 - HMM in crosses and small pedigrees

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

HMM in crosses and small pedigrees, cont.Another problem: reconstructing haplotypesThe Lander-Green HMMTransition probabilitiesAlternative representationObservations (phenotypes)Emission probabilitiesPenetrancesPedigree calculationsThe forward equationsEvaluating the sum (a)Evaluating the sum (b)The backward equationsReconstructing the haplotypesPowerPoint PresentationSlide 16Slide 171HMM in crosses and small pedigrees, cont.Lecture 9, Statistics 246, February 19, 20042Another problem: reconstructing haplotypesThe problem here is to reconstruct the childrens’ haplotypes as in the figure, from marker data on both the children and the parents.3The Lander-Green HMM Recap. The states of the Markov chain are the inheritance vectors. At any locus on a chromosome, the entry in the inheritance vector for a non-founder are 0 if the parental variant passed on at that locus was grandmaternal, and 1 otherwise. Consider a two-parent two-child nuclear family and suppose that the mother and father are 1/2 and 3/4, respectively, while the first (girl) child is 1/3 and the second (boy) is 2/4. Then the inheritance vector is of length 4, v = (vgm , vgp , vbm, vbp), where gm represents the girl’s maternal meiosis, gp her paternal meiosis, and so on. What are the assigments for v at the marker? We don’t know which of the mother’s alleles 1 and 2 came from her mother and which came from her father, but we can arbitrarily declare that it was the one she passed on to her daughter, and similarly for the alleles 3 and 4 of the father. With this assignment, we find that v = (0, 0, 1, 1), because in each case, the boy received alleles from his parents different from his sister’s. In fact the specification of the paternal and maternal chromosomes of a founder is completely arbitrary, and we’ll mention later how this can be turned into a symmetry which speeds up the calculations.4Transition probabilities Now suppose that the same family has genotypes 1’/2’, 3’/4’, 1’/3’ and 2’/4’ at a locus near the first one. If the recombination fraction between the two loci is r, and r is small, then we might expect the inheritance vector v’ at the second locus to coincide with v = (0,0,1,1). But what if the genotypes were 1’/2’, 3’/4’, 1’/3’ and 2’/3’, respectively? This suggests that v’ = (0,0,1,0), with the 10 in the boy’s paternally inherited chromosome denoting a recombination. How do we weigh up these competing possibilities? As with the mouse chromosomes, we need a transition matrix P(r) connecting adjacent inheritance vectors. The form of P is as in the mouse case, namely, a tensor power of the 22 matrices having 1-r on the diagonal, and r on the off-diagonal elements, here P = R(r)4 . In general, it is an 2nth tensor power, where n is the number of non-founders. Thus we have our states and our transition probability matrix, and hence our (product) Markov chain. To complete the specification of our HMM, we need observations and the associated emission probabilities, and an initial distribution.5Alternative representation The purpose of inheritance vectors is to describe the possible patterns of gene flow through a pedigree. Once ordered pairs of alleles are assigned to founders, the 0s and 1s in the inheritance vectors specify the alleles that are passed from parent to offspring down the pedigree. As mentioned in the last lecture, an alternative representation of this gene flow is via what is known as a descent graph, see Lange’s book, ch 9. A recent paper gave an even more economical representation of what is needed, via what the authors called a sparse gene flow tree. I leave those interested to consult Abecasis et al, Nature Genetics 30 2002: 97-101. There (in the program Allegro) the efficient matrix multiplication that we will describe shortly is replaced by a sparse matrix-vector multiplication algorithm more general that the one we will give.6Observations (phenotypes) We now turn to our observations. The data we have on our (small) pedigree will generally consist of genotypes at many marker loci, and perhaps additional disease or other phenotype data, see the pedigree on p.2, where black filling of a square or circle indicated that the person is affected by some specified disease. In the case of interest to us here: reconstructing haplotypes, we’ll assume that there are just marker data. Suppose the unordered pairs of alleles (i.e. genotypes) at marker locus t come from a set (t). Then for f founders and n non-founders, our observations come from (t)f(t)n. Here I could have simply written the (f+n)th power, but it is convenient to keep founders and non-founders notationally distinct. As with the mouse crosses, we could add in ambiguity and missing data “observations”, but for simplicity we won’t do so here. Since we are mainly interested in marker data, let’s denote a typical (vector) observation at locus t by mt.7Emission probabilities Referring to our general discussion of HMM in the previous lecture, we now need to specify the equivalent of the emission probabilities q(i,j,k; t) at each locus t. These probabilities are generally functions of the current and previous state, but here they just depend on the current state, vt , and take the form q(vt ,mt ; t). We want this to be q(vt ,mt ; t) = pr(observations at t = mt | inheritance vector at t = vt ), but right now vt just describes gene flow: we haven’t got started. To complete our description, let us write at = (at,1 ,at,2 ….,at, 2f-1 ,at, 2f ) for the assignment of an ordered pair of alleles to each founder. Suppose that the frequency of allele at,h is pt,h . Under the population equilibrium assumptions we have previously mentioned, pr(at ) = ∏h pt,h . Finally, we sum over all at (in practice, over all at compatible with the observed phenotypes). This gets us started towards defining q(vt ,mt ; t).8Penetrances Having got started, note that alleles at for the founders and an inheritance vector vt gives us a set of ordered genotypes gt = (at ,vt ) at locus t by following the flow. We are almost there. The observations on each individual in the pedigree can now be assigned probabilities given their (ordered) genotypes. This last step involves the terms we have previously called penetrances - probabilities of observed phenotypes given genotypes - and a


View Full Document

Berkeley STATISTICS 246 - HMM in crosses and small pedigrees

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download HMM in crosses and small pedigrees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HMM in crosses and small pedigrees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HMM in crosses and small pedigrees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?