Berkeley INTEGBI 200A - Distance Methods - D2253323

Home> Schools> University of California, Berkeley> Integrative Biology (INTEGBI) > INTEGBI 200A> Distance Methods

DOC PREVIEW

Berkeley INTEGBI 200A - Distance Methods

School name University of California, Berkeley

Course Integbi 200a- Principles of Phylogenetics: Systematics

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2006 Distance Methods Due at the end of class: - Distance matrices and trees for two different distance measures - Tree from one extra distance-based tree building method - Parsimony reference tree Today we’re going to use PAUP* to generate trees using distance methods. We’ve discussed distance methods in class, and you have learned that they are not the most theoretically justified of methods for inferring phylogenies, although clustering methods do have some uses in other areas of statistics. For several reasons, it is important that you learn how to utilize them. First, you should use them as one of the analyses in your final project, as a comparison to other methods. They will almost always give a different tree than the other optimality criteria, since they are searching for total similarity, and not distinguishing between synapomorphy, symplesiomorphy, and homoplasy. Second, some workers people do feel that they are a good way to infer phylogenies. Finally, they are by far the fastest way to find a tree. Whereas parsimony and likelihood methods have to search through tree space and compare the optimization of the character matrix on many trees, most distance methods use an algorithm to directly generate a tree from the distance matrix. This speed makes it very useful for genomics, where it is often necessary to generate tens of thousands of trees, but getting the exact tree each time is not as important as getting the right tree the vast majority of the time. Distance analysis have two main components: the formulas used to calculate the distances (a.k.a. distance measures) and the algorithms used to compute a tree from the distances. Get Sequences, Generate a Reference Tree First, download the cephalopod nexus file from the syllabus page on the 200A website. You may have to right-click on the link and choose save as to download it rather than open it in your browser. Save it to the desktop, or to a folder you create for yourself on the desktop, and open it in PAUP*. Generate a parsimony tree for comparison to the other trees you will make during this lab: set criterion = parsimony; hs; The search will retain only one tree. Look at the tree, then save it: showtree; savetree file=ceph_COI_parsimony.tre; Print out this tree and save it to turn in.Distance Measures: ways to measure how different things are Phenetic methods start by measuring how “different” taxa are from one another. They look at the total similarity across many different characters, and analyze these in a statistical framework to come up a measure of the “distance” between each pair of taxa. This works a lot like those timetables that tell you the distance between two cities – like the example at left. Notice that the distance method can only compare two cities at a time – the same is true for distance methods in phylogenetics. Second, consider that you could measure the “distance” between these cities in several different ways. This chart measures it in miles, but it could also have used kilometers, driving time, or the number of In-and-Out Burgers between each city. In phylogenetics, there are also several different ways of calculating how “far apart” taxa are. Some of them are detailed below. Distance in mi. Hearst Castle Los Angles Monterey Hearst Castle - 234 94 Los Angeles 234 - 327 Monterey 94 327 - Uncorrected (P) P-distance is the uncorrected number of changes between two sequences. It is called “uncorrected” because it does not take account of the base pairs where multiple changes have occurred. It counts each base pair that has changed as 1 change and all unchanged base pairs as 0, although either one of these situations can mask many other changes. Thus the distance between any sequences will approach ¾ as they get larger. For such distances adding together the distance of two segments of a path gives a distance larger than the entire path. This violates a fundamental assumption of most distance based algorithms that such values are equal. Jukes-Cantor (JC) Jukes-Cantor distances are the simplest way to solve this problem. They assume that the chance of any nucleotide changing into any other anywhere in the sequence is equal. Thus, if P(t) is the chance that any nucleotide is a different nucleotide at time t, and u is the instantaneous stochastic rate at which a nucleotide changes into any other nucleotide: P’(t) = u * (1-P(t)) – 1/3 u * P(t) The distance D, will now be the average number of changes per base pair or ut. D = ut = -3/4 ln(1- 4/3 P(t)) where P(t) is the fraction of changed nucleotides. Kimura 2-parameter (K2P) The Kimura two-parameter model essentially solves the problem in the same way as Jukes-Cantor, except that there are different rates for transitions and transversions. Thus if P(t) and Q(t) are the chance a base pair has visibly undergone a transition and a transversion respectively and α and β are the instantaneous stochastic transition and transversion rates: Q’(t) = 2β * (1-Q(t)) - 2β * Q(t) and P’(t) = α * (1-P(t)-Q(t)) + β * Q(t) – (α+2β) * P(t) The distance can now be calculated as, D = (α + 2β) * t = -1/4 ln [ (1-2Q(t)) (1-2P(t)-Q(t))2 ] Where P(t) and Q(t) are the fraction of transitions and transversions observed.General Time-Reversible (GTR) The general time reversible model relaxes the assumptions about the correlation among rates of change from one nucleotide to another as much as possible while still having the same probability of change in both directions. Thus there are six different rates that have to be set (A to G / G to A is one rate, and A to C / C to A another) and an equilibrium frequency for each nucleotide. The total rate has to average out to one and all the equilibrium frequencies have to ad up to one, so that there are eight total parameters in this model, in addition to all the pairwise distances. The distances can not be directly calculated, but instead all the parameters have to be estimated using Maximum Likelihood. Because there are so many parameters, you have to have a lot of data to make a good ML estimate. We’ll talk a lot more about these types of models, when we do Maximum Likelihood. Using Different Distance Measures Each of these different distance measures is implemented in PAUP. You mission is now to try out each one and see

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

Berkeley INTEGBI 200A - Distance Methods

Sign up for free to view:

Please select your school