DOC PREVIEW
Berkeley STATISTICS 246 - Three examples of the EM algorithm

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Three examples of the EM algorithmThe estimation of linkage from the offspring of selfed heterozygotes R A Fisher and Bhai Balmukand, Journal of Genetics 20 79-92 (1928) See also: Collected Papers of R A Fisher, Volume II, Paper 71, pp 285-298.The problemOffspring genotypic and phenotypic frequenciesOffspring genotypic and phenotypic probabilities, cont.Estimation of The incomplete data formulationThe EM algorithm for this problemComments on this exampleFitting a mixture model by EM to discover motifs in biopolymers T L Bailey and C Elkan, UCSD Technical Report CS94-351; ISMB94.Complete data log likelihoodMixture models: the E-stepMixture models: the M-stepMixture models: the M-step, cont.Mixture models: the M-step completed.Comment on the EM algorithmAn Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences C E Lawrence and A A Reilly PROTEINS: Structure, Function and Genetics 7:41-51 (1990)The E-step in this case.The M-step in this case.1Three examples of the EM algorithmWeek 12, Lecture 1Statistics 246, Spring 20022The estimation of linkage from the offspring of selfed heterozygotesR A Fisher and Bhai Balmukand, Journal of Genetics 20 79-92 (1928)See also: Collected Papers of R A Fisher, Volume II, Paper 71, pp 285-298.3The problemIn modern terminology, we have two linked bi-allelic loci, A and B say, with alleles A and a, and B and b, respectively, where A is dominant over a and B is dominant over b.A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB and ab. Since the loci are linked, the types AB and ab will appear with a frequency different from that of Ab and aB, say 1-r and r, respectively, in males, and 1-r’ and r’ respectively in females.Here we suppose that the parental origin of these heterozygotes is from the mating AABB  aabb, so that r and r’ are the male and female recombination rates between the two loci.The problem is to estimate r and r’, if possible, from the offspring of selfed double heterozygotes.4Offspring genotypic and phenotypic frequenciesSince gametes AB, Ab, aB and ab are produced in proportions (1-r)/2, r/2, r/2 and (1-r)/2 respectively by the male parent, and (1-r’)/2, r’/2, r’/2 and (1-r’)/2 respectively by the female parent, zygotes with genotypes AABB, AaBB, …etc, are produced with frequencies (1-r)(1-r’)/4, (1-r)r’/4, etc.Exercise: Complete the Punnett square of offspring genotypes and their associated frequencies. The problem here is this: although there are 16 distinct offspring genotypes, taking parental origin into account, the dominance relations imply that we only observe 4 distinct phenotypes, which we denote by A*B*, A*b*, a*B* and a*b*. Here A* (res. B*) denotes the dominant, while a* (resp. b*) denotes the recessive phenotype determined by the alleles at A (resp. B).5Offspring genotypic and phenotypic probabilities, cont.Thus individuals with genotypes AABB, AaBB, AABb or AaBb, which account for 9/16 gametic combinations (check!), all exhibit the phenotype A*B*, i.e. the dominant alternative in both characters, while those with genotypes AAbb or Aabb (3/16) exhibit the phenotype A*b*, those with genotypes aaBB and aaBb (3/16) exhibit the phenotype a*B*, and finally the double recessives aabb (1/16) exhibit the phenotype a*b*.It is a slightly surprising fact that the probabilities of the four phenotypic classes are definable in terms of the parameter = (1-r)(1-r’), as follows: a*b* has probability /4 (easy to see), a*B* and A*b* both have probabilities (1-)/4, while A*B* has probability 1 minus the sum of the preceding, which is (2+)/4.Exercise: Calculate these phenotypic probabilities.6Estimation of Now suppose we have a random sample of n offspring from the selfing of our double heterozygote. Thus the 4 phenotypic classes will be represented roughly in proportion to their theoretical probabilities, their joint distribution being multinomial Mult [n; (2+ )/4, (1-)/4, (1-)/4, /4].Note that here neither r nor r’ will be separately estimable from these data, but only the product (1-r)(1-r’). Note that since we know that r≤1/2 and r’≤1/2, it follows that  ≥1/4. How do we estimate ? Fisher and Balmukand discuss a variety of methods that were in the literature at the time they wrote, and compare them with maximum likelihood, which is the method of choice in problems like this. We describe a variant on their approach to illustrate the EM algorithm.7The incomplete data formulation Let us denote (cf p. 26 of Week 11b) the counts of the 4 phenotypic classes by y1, y2, y3 and y4, these having probabilities (2+ )/4, (1- )/4, (1-)/4 and /4, respectively. Now the probability of the observing genotype AABB is /4, just as it is with aabb, and although this genotype is phenotypically indistinguishable from the 8 others with phenotype A*B*, it is convenient to imagine that we can distinguish them.So let us denote their count by x1, and let x2 denote count of the remainder of that class, so that x1+x2 = y1. Note that x2 has marginal probability 1/2. In the jargon of the EM algorithm, x1 and x2 are missing data, as we only observe their sum y1. Next, as in p.26 of Week 11b, we let y2=x3, y3=x4 and y4=x5. We now illustrate the approach of the EM algorithm, referring to material in Week 9b and Week 11b for generalities.8The EM algorithm for this problem The complete data log likelihood at  is (Ex: Check): (x2+x5)log + (x3+x4)log(1- ).The expected value of the complete data log likelihood given observed data taken at ’ (E-step: think of ’ as -initial) is: (E’(x2|y1)+y4)log + (y2+y3)log(1- ).Now E’(x2|y1) is just ky1, where k=  ’/(2+’). (Ex: Check.)The maximum over  of the expected value of the complete data log likelihood given observed data taken at ’ (M-step) occurs at ’’ = (ky1+y4)/(ky1+y2+y3+y4). (Ex: Check)Here we think of ’’ as  -next.It should now be clear how the E-step (calculation of k) and the M-step (calculation of ’’) can be iterated.9Comments on this example This completes our discussion of this example. It appeared in the famous EM paper (Dempster, Laird and Rubin, JRSSB 1977) without any explanation of its genetic origins. Of course it is an illustration of the EM only, for the actual likelihood equation generated by the observed data


View Full Document

Berkeley STATISTICS 246 - Three examples of the EM algorithm

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Three examples of the EM algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Three examples of the EM algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Three examples of the EM algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?