DOC PREVIEW
UMD CMSC 838T - An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

lafrnal of Molecular Evolut~xl An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences Jeffrey L. Thorne, Hirohisa Kishino,* and Joseph Felsenstein Depanrnen: of Genetics SK-50. University of Washingron. Seattle. WA 98195. USA Summary. Most algorithms for the alignmen1 of biological sequences are not .derived from an eGo- Iutionary model. Consequently, these alignment al- gorithms lack a strong statistical basis. A maximum likelihood merhod for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition proba- bilities. The evolutionary model can also be used as the basis of procedures that,estimate the evolution- ary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimatioa approach which takes into account all possible alignments between two sequences is introduced, the danger of esti- mating eVolutionary parameters from a single align- ment is discussed. Key words: DNA sequence alignment - Maxi- mum likelihood procedure - Dynamic program- ming - Evolutionary model - Insertion-deletion model Introduction With the advent of m*em molecular biology, the ability to collect biological sequence &ta has out- paced the ability EO adequately analyze this data. One tool for reducing this surfeit of inadequately treated data is sequence alignment. A sequence alignment is designed to exhibit the evolutionary Offprinr reqrcests 10: J.L. Thorne Pfesema&ress:OccanRcscarrh Institute, UnivzrsityofTokyo. 1-15-1. Mioami4ai. NPkaao-lru. Tokyo 164. Japan correspondence between different sequences. It is possible and, among some researchers, popular IO align sequences by eyeball. The eyeball technique is time-consuming: tedious, and irreproducible. In 1970, Needleman and Wunsch presented a dynamic programming algorithm for the alignment of fuo biological sequences by computer. Computer-aided sequence alignment does not possess these disad- vantages of the eyeball technique. The basic dynam- ic programming algorithm chooses the best align- ment by finding the alignment with the minimum associated weight. This is assumed to be the best of all alignments between the two sequences in ques- tion. The evolutionary weight associated yith an alignment is simply the sum of the weights of the evolutionary events implied by the alignment. In the case of an alignment between TWO sequences, insertions cannot be distinguished from deletions. Therefore, the term indelis used to describe an evo- lutionary event that may be either an insertion or a deletion. Because a single-base indel leads to a sin- gle-base gap in the alignment and because a nucle- otide mismatch in the alignment is caused by one or more nucleotide substitutions, the following alignment implies that at least three substitutions and two single-base indels took place: ATAGAG-TTTGTACG -TAGCGGTTCGTTCG The dynamic programming algorithm has sub- sequently been improved (e.g.. Gotoh 1982) but. in its most basic form, there is a weight for each sin@ gap and a weight for each mismatch. If the weight of a mismatch is I and the weight of a single-bag gap is 5, then the weight associated with the above alignment is 13 (= I + I + 1 + 5 -t 5). &complete115 estimating evolutionary parameters. This procedure can adjust the evolutionary weights to the sequences to be aligned. We also examine the bias that is gen- erated when only a single alignment is used for the estimation of evolutionary parameters. Our method for estimating evolutionary parameters is accurate and avoids this bias because it maximizes the like- lihood oftwo sequences. In otherwords, our method maximizes the sum-taken over all possible align- ments between two sequences-of the likelihood of individual alignments. ,%planation of the dynamic programming algorithm can be found in Sankoff and Kruskal(1983). The weakness of the basic dynamic programming method and its subsequent modifications is the lack of an objective procedure to choose the relative weights of gaps and mismatches. The result of this weakness is that researchers are forced to use either of two flawed approaches to obtain an alignment between two sequences. One approach is to arbi- trarily choose these weights and then obtain an alignment. If this alignment is aesthetically pleasing the researcher, the process stops. Otherwise, the researcher continues to adjust the weigh until an aesthetically pleasing alignment is obtained. Obvi- ously, the subjective nature of this approach is not ideal. Another approach is to use the same set of weights for every pairwise alignment. This approach is less subjective than the former approach-only the initid choice of weights is subjective. A few objective aligament techniques have been proposed (e.&, Reichert et al. 1973; Fltch and Smith 1983; Mson and Yee 1990) but only Bishop and nornpson (1986) have described an objective tech- nique that is based upon an evolutionary model. cause evolution is the force that promotes diver- gence between biological quences, it is desirable to view bioIogid sequence alignment algorithms in 1he context of evolution. The weights of evolution- ary events should be a function of evolutionary rates and divergence times. Under this interpretation, the basic dynamic programming procedure assumes that the types of evolutionary events that can change a biological sequence fail into three categories. For a DNA sequence, these three possible types of events are insertion of exactly one base, deletion of exactly one base, and substitution of one base for another. The basic dynamic programming procedure assigns an evolutionary weight to each type of evolutionary event. The evoIutionary weight should be propor- tional to the negative logarithm of the probability of the evolutionary event {Felsenstein 198 la). Thus, the most basic alignment algorithm requires one evolutionary weight for a substitution and another evolutionary weight for a single-base indel. It is in- correct to use the Same set of weights for every pair- wise alignment because the probabilities of evolu- tionary events depend on the particular pair of sequences to be aligned. In this paper, we present a maximum likelihood approach to the alignment of a pair of DNA se- . quences. This maximum likelihood approach is an extension


View Full Document

UMD CMSC 838T - An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences

Documents in this Course
Load more
Download An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?