Unformatted text preview:

1CMSC 838T – Lecture 5CMSC 838T – Lecture 5X Phylogenetics0 Study of evolutionary relationships (sequences / species)0 Infer evolutionary relationship from shared features0 May improve multiple sequence alignment (MSA)CMSC 838T – Lecture 5PhylogeneticsX Phylogeny0 Relationship between organisms with common ancestorX Phylogenetic tree 0 Graph representing evolutionary history of sequence / speciesX Premise0 Members sharing common evolutionary history (i.e., common ancestor) are more related to each other0 Can infer evolutionary relationship from shared featuresX Long history of phylogenetics (from field of genetics)0 Historically → based on analysis of observable features (e.g., morphology, behavior, geographical distribution)0 Now → mostly analysis of DNA / RNA / amino acid sequences2CMSC 838T – Lecture 5Phylogenetics – Motivation & AlignmentX Goal of phylogenetics0 Understand relationship of sequence to similar sequences0 Construct phylogenetic tree representing evolutionary historyX Motivation / application0 Identify closely related familiesO Use phylogenetic relationships to predict gene function0 Follow changes in rapidly evolving species (e.g., viruses)O Analysis can reveal which genes are under selection O Provide epidemiology for tracking infections & vectors0 Few direct applicationsX Relationship to multiple sequence alignment (MSA)0 Alignment of sequences should take evolution into account0 More precise phylogenetic relationships ↔ improved MSACMSC 838T – Lecture 5Plylogenetics OverviewX Phylogenetic treesX Tree construction algorithms0 Distance methodsO UPGMAO Neighbor-joining0 Maximum parsimony0 Maximum likelihoodX Assessing phylogenetic trees3CMSC 838T – Lecture 5Phylogenetic TreesUnrooted tree (Dendrogram)Rooted treesCMSC 838T – Lecture 5Phylogenetic TreesUnrooted tree (Dendrogram)Rooted treesleavesjoins / nodesbranchesrootscluster / clade4CMSC 838T – Lecture 5Phylogenetic TreesX Leaves / taxa0 Original sequences X Branches0 Represent change0 Length represents evolutionary distanceX Cluster / clade0 All sequences in subtree with common ancestor (treated as single node)X Join / node0 Point of joining two leaves / clustersdistanceCMSC 838T – Lecture 5Phylogenetic TreesX Use binary trees (evolution is bifurcating process)0 Can approximate all tree shapes (w/ arbitrarily short edges)0 Simplifies tree generation & analysisX Trees can be represented in rectangular form0 Alternative form of representation0 Distance determined only by “height” of branchD B C Anormal formD B C Arectangular form5CMSC 838T – Lecture 5Phylogenetic TreesX Can label branches of tree with change to sequenceN Y L SCMSC 838T – Lecture 5Phylogenetic Trees – DistanceX (Evolutionary) Distance0 Many possible measuresO Fraction of sites that differ between two sequencesO # of changes needed to convert one sequence to anotherO Pairwise alignment scores, normalized by average score for random alignment [Feng & Doolittle 1996]Score = (S.actual – S.random) / (S.identical – S.random)Where s.identical = score for aligning identical sequenceX Distance matrix0 Matrix of pairwise distances between all sequences0 Used to generate treeX Tree shape0 Varies with construction method, distance metricSeq. A B C DA — 8 7 12B — 9 14C — 11D —6CMSC 838T – Lecture 5Phylogenetic Trees – DistanceX Distances are ultrametric if0 Same rate of change on all branches in tree (rare in practice)O All leaves equidistant from rootO Also known as a “molecular clock”0 Distance matrix must satisfy the following 3-point condition0 For any three leaves i, j, k, distances dij, dik, djkO two of three distances are equal and ≥ thirddij= dik≥ djki j k m i j k m i j k m i j k mCMSC 838T – Lecture 5X Distances are additive if0 Distance between any two leaves i & j on tree = sum of lengths of edges connecting i & j0 Distance matrix must satisfy the following 4-point condition0 For any four leaves i, j, k, m, two of the distances dij+dkm, dik+djm, dim+djkare equal and greater than the thirddij+dkm< dim+djk= dik+djm0 In fact, the difference is 2 × the length of the “bridge” edge(s) Phylogenetic Trees – Distanceijmk ijmk ijmkijmkbridge7CMSC 838T – Lecture 5Tree Construction – UPGMAX UPGMA (Unweighted Pair Group Method using Arithmetic Averages) [Sokal & Michener 1958]X Algorithm1. Find pair of sequences A, B with smallest distance DAB2. Insert join for A, B at tree height = ½ DAB3. Update distance to new cluster as the average distance betweens pairs of sequences in each cluster4. Repeat until all sequences / clusters joined5. Produces rooted treeX Assumptions0 Distances for tree are ultrametricO Branch lengths for 2 leaves same after join0 Distances for tree are additiveABABC½ DAB½ DC(AB)CMSC 838T – Lecture 5Tree Construction ExampleDistance matrixSequencesA B C DA — 8 7 12B — 9 14C — 11D —Original treeNote that tree distances are additive(i.e., distance between X, Y = sum of lengths of edges connecting X, Y)8CMSC 838T – Lecture 5Tree Construction Example – UPGMAA B C DA — 8 7 12B — 9 14C — 11D —A-C B DA-C — 8.5 11.5B — 14D ——D12.333—A-C-BDA-C-BD B C A½ 12.333½ 8.5½ 7Distance matrices UPGMA treeA-CA-C-BUPGMA keeps all leaves in clusters and uses them in calculationsA, C closestA-C, B closestHeight = ½ distanceCMSC 838T – Lecture 5X Goal0 Join closest neighbors (nodes w / same parent) in tree0 Avoids problem with UPGMA when rates of change differX Example0 Closest leaves not neighbors incorrect tree, butjoined first by UPGMAX Assumptions0 Rate of change can differO Branch lengths may differ after join0 Branch lengths for tree are additiveTree Construction – Neighbor-JoiningDAABCDBC9CMSC 838T – Lecture 5X Calculating branch lengths after join (additive tree)X Simple algebra shows0 Given O dA,B = a + bO dA,C = a + cO dB,C = b + cNeighbor-Joining – Basic PrincipleACBA B CA — dA,BdA,CB — dB,CC —abc0 We can calculateO a = ½ (dA,B+ dA,C –dB,C )O b = ½ (dA,B+ dB,C –dA,C )O c = ½ (dB,C+ dA,C –dA,B )CMSC 838T – Lecture 5X Example (additive tree, not ultrametric)0 Given distance matrix, calculate branch lengthsNeighbor-Joining – Basic PrincipleACBCA B CA — 8 13.5B — 15.5C —abcCalculation resultsa = ½ (dA,B+dA,C –dB,C ) = ½ (8 + 13.5 – 15.5) = 3b = ½ (dA,B+ dB,C –dA,C ) = ½ (8 + 15.5 – 13.5) = 5c = ½ (dB,C+dA,C


View Full Document

UMD CMSC 838T - Lecture 5 Phylogenetics

Documents in this Course
Load more
Download Lecture 5 Phylogenetics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 5 Phylogenetics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5 Phylogenetics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?