DOC PREVIEW
Stanford CS 262 - Phylogeny Tree Reconstruction

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Phylogeny Tree ReconstructionPhylogenetic TreesParsimony – direct method not using distancesExample: Parsimony cost of one columnParsimony ScoringExampleTraceback to find ancestral nucleotidesSlide 8Multiple Sequence AlignmentsDefinitionScoring Function: Sum Of PairsSum Of Pairs (cont’d)A Profile RepresentationSlide 14Multidimensional DPSlide 16Slide 17Slide 18Progressive AlignmentSlide 20Slide 21Heuristics to improve alignmentsIterative RefinementSlide 24Slide 25Slide 26ConsistencySlide 28Some ResourcesPhylogeny Tree Reconstruction1432514235CS262 Lecture 13, Win07, BatzoglouPhylogenetic Trees•Nodes: species•Edges: time of independent evolution•Edge length represents evolution timeAKA genetic distanceNot necessarily chronological timeCS262 Lecture 13, Win07, BatzoglouParsimony – direct method not using distances•One of the most popular methods:GIVEN multiple alignmentFIND tree & history of substitutions explaining alignmentIdea: Find the tree that explains the observed sequences with a minimal number of substitutionsTwo computational subproblems:1. Find the parsimony cost of a given tree (easy)2. Search through all tree topologies (hard)CS262 Lecture 13, Win07, BatzoglouExample: Parsimony cost of one columnABAA{A, B}CostC+=1{A}Final cost C = 1{A}{A}{B}{A}{A}ABAACS262 Lecture 13, Win07, BatzoglouParsimony Scoring Given a tree, and an alignment column uLabel internal nodes to minimize the number of required substitutionsInitialization:Set cost C = 0; node k = 2N – 1 (last leaf)Iteration:If k is a leaf, set Rk = { xk[u] } // Rk is simply the character of kth speciesIf k is not a leaf,Let i, j be the daughter nodes;Set Rk = Ri  Rj if intersection is nonemptySet Rk = Ri  Rj, and C += 1, if intersection is emptyTermination:Minimal cost of tree for column u, = CCS262 Lecture 13, Win07, BatzoglouExampleA A A B{A} {A} {A} {B}B A BA{A} {B} {A} {B}{A}{A}{A}{A,B}{A,B}{B}{B}CS262 Lecture 13, Win07, BatzoglouTraceback:1. Choose an arbitrary nucleotide from R2N – 1 for the root 2. Having chosen nucleotide r for parent k, If r  Ri choose r for daughter iElse, choose arbitrary nucleotide from RiEasy to see that this traceback produces some assignment of cost CTraceback to find ancestral nucleotidesCS262 Lecture 13, Win07, BatzoglouExampleABAB{A, B}{A, B}{A}{A}{B}{A}{B}ABABAA AxxABABAB AxxABABBB BxxAdmissible with TracebackStill optimal, but inadmissible with TracebackCS262 Lecture 13, Win07, BatzoglouMultiple Sequence Multiple Sequence AlignmentsAlignmentsCS262 Lecture 13, Win07, BatzoglouDefinition•Given N sequences x1, x2,…, xN:Insert gaps (-) in each sequence xi, such that•All sequences have the same length L•Score of the global map is maximum•A faint similarity between two sequences becomes significant if present in many•Multiple alignments reveal elements that are conserved among a class of organisms and therefore important in their common biology•The patterns of conservation can help us tell function of the elementCS262 Lecture 13, Win07, BatzoglouScoring Function: Sum Of Pairs Definition: Induced pairwise alignmentA pairwise alignment induced by the multiple alignmentExample: x: AC-GCGG-C y: AC-GC-GAG z: GCCGC-GAGInduces:x: ACGCGG-C; x: AC-GCGG-C; y: AC-GCGAGy: ACGC-GAC; z: GCCGC-GAG; z: GCCGCGAGCS262 Lecture 13, Win07, BatzoglouSum Of Pairs (cont’d)•Heuristic way to incorporate evolution tree:HumanMouseChicken•Weighted SOP:S(m) = k<l wkl s(mk, ml)DuckCS262 Lecture 13, Win07, BatzoglouA Profile Representation•Given a multiple alignment M = m1…mnReplace each column mi with profile entry pi•Frequency of each letter in •# gaps•Optional: # gap openings, extensions, closingsCan think of this as a “likelihood” of each letter in each position - A G G C T A T C A C C T G T A G – C T A C C A - - - G C A G – C T A C C A - - - G C A G – C T A T C A C – G G C A G – C T A T C G C – G G A 1 1 .8 C .6 1 .4 1 .6 .2G 1 .2 .2 .4 1T .2 1 .6 .2- .2 .8 .4 .8 .4CS262 Lecture 13, Win07, BatzoglouMultiple Sequence AlignmentsAlgorithmsCS262 Lecture 13, Win07, BatzoglouMultidimensional DPGeneralization of Needleman-Wunsh:S(m) = i S(mi)(sum of column scores)F(i1,i2,…,iN): Optimal alignment up to (i1, …, iN) F(i1,i2,…,iN) = max(all neighbors of cube)(F(nbr)+S(nbr))CS262 Lecture 13, Win07, Batzoglou•Example: in 3D (three sequences):•7 neighbors/cellF(i,j,k) = max{ F(i – 1, j – 1, k – 1) + S(xi, xj, xk),F(i – 1, j – 1, k ) + S(xi, xj, - ),F(i – 1, j , k – 1) + S(xi, -, xk),F(i – 1, j , k ) + S(xi, -, - ),F(i , j – 1, k – 1) + S( -, xj, xk),F(i , j – 1, k ) + S( -, xj, - ),F(i , j , k – 1) + S( -, -, xk) } Multidimensional DPCS262 Lecture 13, Win07, BatzoglouRunning Time:1. Size of matrix: LN;Where L = length of each sequence N = number of sequences2. Neighbors/cell: 2N – 1Therefore………………………… O(2N LN)Multidimensional DPCS262 Lecture 13, Win07, BatzoglouRunning Time:1. Size of matrix: LN;Where L = length of each sequence N = number of sequences2. Neighbors/cell: 2N – 1Therefore………………………… O(2N LN)Multidimensional DP•How do gap states generalize?•VERY badly!Require 2N – 1 states, one per combination of gapped/ungapped sequencesRunning time: O(2N  2N  LN) = O(4N LN)XY XYZ ZY YZX XZCS262 Lecture 13, Win07, BatzoglouProgressive Alignment•When evolutionary tree is known:Align closest first, in the order of the treeIn each step, align two sequences x, y, or profiles px, py, to generate a new alignment with associated profile presultWeighted version:Tree edges have weights, proportional to the divergence in that edgeNew profile is a weighted average of two old profilesxwyzpxypzwpxyzwCS262 Lecture 13, Win07, BatzoglouProgressive Alignment•When evolutionary tree is known:Align closest first, in the order of the treeIn each step, align two sequences x, y, or profiles px, py, to generate a new alignment with associated profile presultWeighted version:Tree edges have weights, proportional to the divergence in that edgeNew profile is a weighted average of two old profilesxwyzExampleProfile: (A, C, G, T, -)px = (0.8, 0.2, 0, 0, 0)py = (0.6, 0, 0, 0,


View Full Document

Stanford CS 262 - Phylogeny Tree Reconstruction

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Phylogeny Tree Reconstruction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Phylogeny Tree Reconstruction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Phylogeny Tree Reconstruction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?