How many genes Mapping mouse traits cont Lecture 3 Statistics 246 January 27 2004 1 Inferring linkage and mapping markers We now turn to deciding when two marker loci are linked and if so estimating the map distance between them Then we go on and create a full marker map of each chromosome relative to which we can map trait genes With these preliminaries completed we can map trait loci 2 The LOD score Suppose that we have two marker loci and we don t know whether or not they are linked A natural way to address this question is to carry out a formal test of the null hypothesis H r 1 2 against the alternative K r 1 2 using the marker data from our cross The test statistic almost always used in this context is log10 of the ratio of the likelihood at the maximum likelihood estimate r to that at the null r 1 2 i e L r LOD log10 L 1 2 3 Calculating the LOD score Recall that the log likelihood here is based on the multinomial distribution for the allocation of n 132 intercross mice into their nine 2 locus genotypic categories As we saw earlier it can be written log10 L r n i log10 pi r i and so we take the difference between this function evaluated at r and at r 1 2 which is LOD n i log10 pi r qi i where qi is 1 16 1 8 or 1 4 depending on i 4 Null probabilities of 2 locus genotypes L1 L2 A H A 1 16 1 8 H 1 8 1 4 B 1 16 1 8 B 1 16 1 8 1 16 This is just putting r 1 2 in an earlier table Exercise Suggest some different test statistics to discriminate between the null H and the alternative K How do they perform in comparison to the LOD 5 Using the LOD score Normal statistical practice would have us setting a type 1 error in a given context cross sample size and determining the cut off for the LOD which would achieve approximately the desired error under the null hypothesis This approach is rarely adopted in genetics where tradition dictates the use of more stringent thresholds which take into a account the multiple testing common on linkage mapping It was originally motivated by a Bayesian argument and in fact Bayesian approaches to linkage analysis are increasingly popular Let us use of Bayes formula in the form log10 posterior odds log10 prior odds LOD where the odds are for linkage With 20 chromosomes which we might assume approx the same size and not too long the prior probability of two random loci being on the same chromosome and hence linked is about 1 20 In order to overcome these prior odds against linkage and achieve 6 reasonable posterior odds say 100 1 we would want a LOD of at least 3 Linkage groups And so it has come to pass that a LOD must be 3 to get people s attention We ll be a little more precise later The next step is to define what are called linkage groups These partition the markers into classes every pair of markers being either closely linked i e r 0 or being connected by a chain of markers each consecutive pair of which is closely linked In practice we might define closely linked to be something like a r c1 and b LOD r c2 where e g c1 0 2 c2 3 7 Forming linkage groups cont When one tries to form linkage groups it is not unusual to have to vary c1 and c2 a little until all markers fall into a group of more than just one marker When this is done it is hoped that the linkage groups correspond to chromosomes If the chromosome number of the species is known and that coincides with the number of linkage groups this is a reasonable presumption But much can happen to dash this hope one may have two linkage groups corresponding to different arms of the same chromosome and not know that one can have a marker at the end of one chromosome linked to a marker at the end of another chromosome though this should be rare if there is plenty of data and so on 8 Ordering linkage groups Next we want to order the markers in a linkage group ideally on a chromosome How do we do that An initial ordering can be done by starting one of the markers M1 say on the most distant pair here distance being recombination fraction or map distance Call M2 the closest marker to M1 and continue in this way Now we want to confirm our ordering One way is to calculate a maximized log likelihood for every ordering and select the one with the largest log likelihood But if we have say 11 markers on a chromosome this is 11 4 107 orders What people often do is take moving k tuples of markers and optimize the order of each e g with k 3 or 4 Whichever strategy one adopts multi i e 2 locus methods are needed 9 Likelihoods for 3 locus data Suppose that we have 3 markers M1 M2 and M3 in that order How do we calculate the log likelihood of the associated 3 locus marker data from our intercross Recalling the discussion preceding the Punnett square of the last lecture the parental haplotypes here are a1a2a3 and b1b2b3 while are would no fewer than 6 forms of recombinant haplotypes the four single recombinants a1a2b3 a1 b2 b3 b1b2a3 and b1a2a3 and the two double recombinants a1b2 a3 and b1a2b3 Proceeding as before we calculate the probability of each of these in terms of the recombination fractions r1 and r2 across intervals M1 M2 and M2 M3 respectively For simplicity we assume the Poisson model with independence of recombination across disjoint intervals For example a1a2a3 would have probability 1 r1 1 r2 4 a1a2b3 would have probability 1 r1 r2 4 while a1b2 a3 would have probability r1r2 We would do this for every one of the 8 paternal and 8 maternal haplotypes and then collect them up to assign probabilities for each of the 33 3 locus genotypes AAA AAH BBB and maximize the multinomial likelihood in 10 the parameters r1 and r2 This is just as in the 2 locus case Multilocus linkage loci 3 It should have become clear by now that the strategy just outlined is not going to work too easily when there are say 11 loci in a linkage group In that case haplotypes are strings of the form a1a2b3 a10b11 where there are just 2 parental and 210 2 distinct recombinant haplotypes The number of parental haplotype combinations is the square of this number and they must be mapped into 311 11 locus genotypes and a multinomial MLE carried out to estimate 10 recombination fractions What can be done In 1987 the first large scale human genetic map was published and at the same time a new algorithm was announced …
View Full Document