DOC PREVIEW
UMD CMSC 423 - Lecture 14

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC423: Bioinformatic Algorithms, Databases and ToolsLecture 14phylogenetic treesCMSC423 Fall 2008 2Phylogeny questions• Given several organisms & a set of features (usually sequence, but also morphological: wing shape/color...)•A. Given a phylogenetic tree – figure out what the ancestors looked like (what are the features of internal nodes)• B. Find the phylogenetic tree that best describes the common evolutionary heritage of the organismswings, feathers, teethclaws, no wings, fur?ACABBBAC CCMSC423 Fall 2008 3Phylogeny questions• A. Easy-ish – can be done with dynamic programming• B. Hard – Many possible trees2(2 3)!2 ( 2)!nnn−−−rooted trees with n leavesCMSC423 Fall 2008 4Scoring a tree – Sankoff's algorithm• Assumption – we try to minimize # of state changes from root to leaves – Parsimony approach• Small parsimony–given a tree where leaves are labeled with m-character strings–find labels at internal nodes s.t. # of state transitions is minimzed •Weighted small parsimony–same as parsimony except that state transitions are assigned weights– minimize the overall weight of the treeCMSC423 Fall 2008 5Example00 1 1 000 1 1 00011 0000011100101CMSC423 Fall 2008 6Sankoff's algorithm• At each node v in the tree store s(v,t) – best parsimony score for subtree rooted at v if character stored at v is t• Traverse the tree in post-order and update s(v,t) as follows–assume node v has children u and w–s(v,t) = mini {s(u,i) + score(i,t)} + minj {s(w,j) + score(j,t)}•Character at root will be the one that maximizes s(root, t)• Note – this solves the weighted version. For unweighted set score (i,i) = 0, score(i,j) = 1 for any i,jCMSC423 Fall 2008 7Trees as clustering• Start with a distance matrix – distance (e.g. alignment distance) between any two sequences (leaves)• Intuitively – want to cluster together the most similar sequences•UPGMA – Unweighted Pair Group Method using Arithmetic averages– Build pairwise distance matrix (e.g. from a multiple alignment)– Pick pair of sequences that are closest to each other and cluster them – create internal node that has the sequences as children– Repeat, including newly created internal nodes in the distance matrix– Key element – must be able to quickly compute distance between clusters (internal nodes) – weighted distance1 21 2,1 21( , ) ( , )| || |p cl q clD cl cl D p qcl cl∈ ∈=∑CMSC423 Fall 2008 8Trees as clustering• Note that UPGMA does not estimate branch lengths – they are all assumed equal• Neighbor-joining–distance between two sequences is not sufficient – must also know how each sequence compares to every other sequence–NJdist(i,j) = D(i,j) – (ri + rj) -ri, rj correction factors 1( , )2ikr D i km=−∑CMSC423 Fall 2008 9Neighbor joining• Pick two nodes with NJdist(i,j) minimal–Create parent k s.t.– D(k, m) = 0.5 (D(i,m) + D(j,m) – D(i,j)) for every other node m–D(i, k) = 0.5 (D(i,j) + ri – rj) - length of branch between i &k–D(j, k) = 0.5 (D(i,j) + rj – ri) – length of branch between j & kCMSC423 Fall 2008 10Trees as clustering• Note that both UPGMA and NJ assume distance matrix is additive: D(i,j) + D(j,k) = D(i,k) - usually not true but close• Also, NJ can be proven to build the optimal tree!• But, simple alignment distance is not a good metricCMSC423 Fall 2008 11Maximum likelihood•For every branch S->T of length t, compute P(T|S,t) – likelihood that sequence S could have evolved in time t into sequence T• Find tree that maximizes the likelihood•Note that likelihood of a tree can be computed with an algorithm similar to Sankoffs• However, no simple way to find a tree given the sequences – most approaches use heuristic search techniques•Often, start with NJ tree – then "tweak" it to improve likelihoodCMSC423 Fall 2008 12Tree analysis & displayCMSC423 Fall 2008 13Taxon ATaxon BTaxon CTaxon D111635genetic changeTaxon ATaxon BTaxon CTaxon DtimeTaxon ATaxon BTaxon CTaxon Dno meaningCladogram Phylogram Ultrametric treeAll show the same evolutionary relationships, or branching orders, between the taxa.from www.albany.edu/faculty/cs812/StewartTalk2.ppt Three types of treesCMSC423 Fall 2008 14Different tree viewshttp://www-ab.informatik.uni-tuebingen.de/software/dendroscope/welcome.htmlCMSC423 Fall 2008 15Drawing trees•Trees are easy to draw – just need to figure out how much space the leaves will take•Step 1 – calculate how much space each node will take (how many leaves from current node)•Step 2 – spread out the nodes according to # of leaves•Many ways of optimizing: e.g. width, area•For large trees–3D displays (there's more room in 3D)–interactive displays (expand contract nodes as needed)CMSC423 Fall 2008 16Analysis example•Build multiple alignment (e.g. Muscle, ClustalW)•Clean up alignment –manual editing–filters (pre-defined structure information)•Build tree–PAUP – parsimony & others–Phylip – maximum likelihood–Tree-Puzzle –maximum likelihood–etc... (many packages)•Integrated system – ARB–www.arb-home.deCMSC423 Fall 2008 17Antibiotic resistance inStaphylococcus aureusGreen boxes – individualstrains in a phylogenetic treeRed diamonds, yellow triangle - acquisition of resistance Hexagon – loss of resistanceCMSC423 Fall 2008 18Questions•Why do you need a multiple alignment for phylogeny?•What is the running time of the neighbor-joining algorithm, given k sequences of length L?•What is the parsimony score of the following tree, and what are the labels at internal


View Full Document

UMD CMSC 423 - Lecture 14

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Download Lecture 14
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 14 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 14 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?