How many genes Mapping mouse traits cont Lecture 2B Statistics 246 January 22 2004 1 Let s estimate the recombination fraction r between D12Mit51 and D12Mit132 132 51 A H B Total A H B Total 26 10 0 36 10 46 5 61 0 9 23 32 36 65 28 129 2 locus genotypes at D12Mit51 and D12Mit132 129 offspring from H H where A B H 2 Estimation of r First note that we can t simply count recombinants Why Because recombination can occur in the paternal or the maternal meiosis or both and all we see are the genotypes of the offspring In most cases the parental origin of the recombination can be inferred but not in every case Denoting the two markers by 1 and 2 the NOD alleles by a and B6 alleles by b then the parental haplotypes are a1a2 on one chromosome and b1b2 on the other Each parent passes on a1a2 with probability 1 r 2 and similarly for b1b2 while they pass on each of the recombinant haplotypes a1b2 and b1a2 with probability r 2 In practice recombinations have slightly different frequencies in male and female meioses but we ignore this refinement 3 Probabilities of parentally transmitted haplotype combinations 4 Haplotype combinations resulting from crossing doubly heterozygous parents each a1 b1 at locus 1 and a2 b2 at locus 2 This table is for coupling the parental haplotypes are a1a2 and b1b2 i e the mother and father are both a1a2 b1b2 Here P and M denote the Paternally and Maternally transmitted haplotypes respectively P M a1a2 a1a2 a1b2 b1a2 b1b2 a1b2 b1a2 1 r 2 r 1 r r 1 r r 1 r r2 r2 r 1 r r2 r2 1 r 2 r 1 r r 1 r b1b2 1 r 2 r 1 r r 1 r 1 r 2 4 From the Punnett square to the table of 2 locus genotype probabilities Terms in the Punnett square table can be summed to build up a table of probabilities for the 9 different 2 locus genotype probabilities For example we observe A a1 a1 at locus 1 and H a2 b2 at locus 2 if and only if the transmitted male and female haplotypes are the pairs a1a2 a1b2 or a1b2 a1a2 and this occurs with a combined probability of 2r 1 r 4 The other terms are built up similarly the most complex case being the 2 locus genotype HH where 4 different terms need to be considered corresponding to the fact that a double heterozygote can result from 4 different combinations of 5 parental or recombinant haplotypes Probabilities of 2 locus genotypes 4 L1 A H B L2 A 1 r 2 2r 1 r H B 2r 1 r r2 2 r2 1 r 2 2r 1 r r2 2r 1 r 1 r 2 Looking at this table we see that recombinations or not can be inferred apart from the parent in all but the HH case We can almost count recombinants 6 Estimation of r cont Using the table of probabilities we can write down a log likelihood function for any set of 2 locus frequencies Label the cells of the table 1 9 and denote the corresponding probabilities by p1 r p9 r and the frequencies by n1 n9 Then the log likelihood for the resulting multinomial model is log L i ni log pi r The parameter r is then estimated by maximizing this function and an approximate standard error or confidence interval obtained using the Fisher information or the asymptotic chi square approximation 7 A frill the M step of an EM algorithm The function log L r can be maximized in a number of ways but in general there is no closed form r If expression for the maximum likelihood estimate r we were able to decompose the count n5 of HHs into the n5P that are pairs of parental haplotypes and n5R that are pairs of recombinant haplotypes with frequencies 1 r 2 and r2 resp the recombinant haplotypes can then be counted directly and the MLE is r 2 n3 n7 n5R n2 n4 n6 n8 2n 8 The E step In general we don t know n5R but can estimate it using the following formula 2 r E n n 5 n 2 2 5 1 r r R 5 In practice we need a value of r to begin with Next we use the above estimate then get the next r and then iterate Exercise Prove the above formula and that the iteration is an instance of the EM algorithm 9 2 locus genotype frequencies for D12Mit132 and D13Mit6 132 6 A A 10 H 15 B 5 Total 30 H B Total 21 29 21 71 7 17 6 30 38 61 32 131 10 Exercise Estimate r for these two loci Is it different from 1 2
View Full Document