DOC PREVIEW
Stanford CS 262 - Phylogeny Tree Reconstruction

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Phylogeny Tree ReconstructionPhylogenetic TreesParsimony – direct method not using distancesExample: Parsimony cost of one columnParsimony ScoringExampleTraceback to find ancestral nucleotidesSlide 8Multiple Sequence AlignmentsEvolution at the DNA levelProtein PhylogeniesOrthology and ParalogyOrthology, Paralogy, Inparalogs, OutparalogsSlide 14DefinitionScoring Function: Sum Of PairsSum Of Pairs (cont’d)A Profile RepresentationSlide 19Multidimensional DPSlide 21Slide 22Slide 23Progressive AlignmentSlide 25Slide 26Heuristics to improve alignmentsIterative RefinementSlide 29Slide 30Slide 31ConsistencySlide 33Real-world protein alignersMUSCLE at a glancePROBCONS at a glanceSome ResourcesCS262 Lecture 9, Win07, BatzoglouPhylogeny Tree Reconstruction1432514235CS262 Lecture 9, Win07, BatzoglouPhylogenetic Trees•Nodes: species•Edges: time of independent evolution•Edge length represents evolution timeAKA genetic distanceNot necessarily chronological timeCS262 Lecture 9, Win07, BatzoglouParsimony – direct method not using distances•One of the most popular methods:GIVEN multiple alignmentFIND tree & history of substitutions explaining alignmentIdea: Find the tree that explains the observed sequences with a minimal number of substitutionsTwo computational subproblems:1. Find the parsimony cost of a given tree (easy)2. Search through all tree topologies (hard)CS262 Lecture 9, Win07, BatzoglouExample: Parsimony cost of one columnABAA{A, B}CostC+=1{A}Final cost C = 1{A}{A}{B}{A}{A}ABAACS262 Lecture 9, Win07, BatzoglouParsimony Scoring Given a tree, and an alignment column uLabel internal nodes to minimize the number of required substitutionsInitialization:Set cost C = 0; node k = 2N – 1 (last leaf)Iteration:If k is a leaf, set Rk = { xk[u] } // Rk is simply the character of kth speciesIf k is not a leaf,Let i, j be the daughter nodes;Set Rk = Ri  Rj if intersection is nonemptySet Rk = Ri  Rj, and C += 1, if intersection is emptyTermination:Minimal cost of tree for column u, = CCS262 Lecture 9, Win07, BatzoglouExampleA A A B{A} {A} {A} {B}B A BA{A} {B} {A} {B}{A}{A}{A}{A,B}{A,B}{B}{B}CS262 Lecture 9, Win07, BatzoglouTraceback:1. Choose an arbitrary nucleotide from R2N – 1 for the root 2. Having chosen nucleotide r for parent k, If r  Ri choose r for daughter iElse, choose arbitrary nucleotide from RiEasy to see that this traceback produces some assignment of cost CTraceback to find ancestral nucleotidesCS262 Lecture 9, Win07, BatzoglouExampleABAB{A, B}{A, B}{A}{A}{B}{A}{B}ABABAA AxxABABAB AxxABABBB BxxAdmissible with TracebackStill optimal, but inadmissible with TracebackCS262 Lecture 9, Win07, BatzoglouMultiple Sequence Multiple Sequence AlignmentsAlignmentsCS262 Lecture 9, Win07, BatzoglouEvolution at the DNA level…ACGGTGCAGTTACCA……AC----CAGTCCACCA…MutationSEQUENCE EDITSREARRANGEMENTSDeletionInversionTranslocationDuplicationCS262 Lecture 9, Win07, BatzoglouProtein Phylogenies•Proteins evolve by both duplication and species divergenceCS262 Lecture 9, Win07, BatzoglouOrthology and ParalogyHB HumanHB HumanWB WormWB WormHA1 HumanHA1 HumanHA2 HumanHA2 HumanYeastYeastWA WormWA WormOrthologs:Derived by speciationParalogs:Everything elseCS262 Lecture 9, Win07, BatzoglouOrthology, Paralogy, Inparalogs, OutparalogsCS262 Lecture 9, Win07, BatzoglouCS262 Lecture 9, Win07, BatzoglouDefinition•Given N sequences x1, x2,…, xN:Insert gaps (-) in each sequence xi, such that•All sequences have the same length L•Score of the global map is maximum•A faint similarity between two sequences becomes significant if present in many•Multiple alignments reveal elements that are conserved among a class of organisms and therefore important in their common biology•The patterns of conservation can help us tell function of the elementCS262 Lecture 9, Win07, BatzoglouScoring Function: Sum Of Pairs Definition: Induced pairwise alignmentA pairwise alignment induced by the multiple alignmentExample: x: AC-GCGG-C y: AC-GC-GAG z: GCCGC-GAGInduces:x: ACGCGG-C; x: AC-GCGG-C; y: AC-GCGAGy: ACGC-GAC; z: GCCGC-GAG; z: GCCGCGAGCS262 Lecture 9, Win07, BatzoglouSum Of Pairs (cont’d)•Heuristic way to incorporate evolution tree:HumanMouseChicken•Weighted SOP:S(m) = k<l wkl s(mk, ml)DuckCS262 Lecture 9, Win07, BatzoglouA Profile Representation•Given a multiple alignment M = m1…mnReplace each column mi with profile entry pi•Frequency of each letter in •# gaps•Optional: # gap openings, extensions, closingsCan think of this as a “likelihood” of each letter in each position - A G G C T A T C A C C T G T A G – C T A C C A - - - G C A G – C T A C C A - - - G C A G – C T A T C A C – G G C A G – C T A T C G C – G G A 1 1 .8 C .6 1 .4 1 .6 .2G 1 .2 .2 .4 1T .2 1 .6 .2- .2 .8 .4 .8 .4CS262 Lecture 9, Win07, BatzoglouMultiple Sequence AlignmentsAlgorithmsCS262 Lecture 9, Win07, BatzoglouMultidimensional DPGeneralization of Needleman-Wunsh:S(m) = i S(mi)(sum of column scores)F(i1,i2,…,iN): Optimal alignment up to (i1, …, iN) F(i1,i2,…,iN) = max(all neighbors of cube)(F(nbr)+S(nbr))CS262 Lecture 9, Win07, Batzoglou•Example: in 3D (three sequences):•7 neighbors/cellF(i,j,k) = max{ F(i – 1, j – 1, k – 1) + S(xi, xj, xk),F(i – 1, j – 1, k ) + S(xi, xj, - ),F(i – 1, j , k – 1) + S(xi, -, xk),F(i – 1, j , k ) + S(xi, -, - ),F(i , j – 1, k – 1) + S( -, xj, xk),F(i , j – 1, k ) + S( -, xj, - ),F(i , j , k – 1) + S( -, -, xk) } Multidimensional DPCS262 Lecture 9, Win07, BatzoglouRunning Time:1. Size of matrix: LN;Where L = length of each sequence N = number of sequences2. Neighbors/cell: 2N – 1Therefore………………………… O(2N LN)Multidimensional DPCS262 Lecture 9, Win07, BatzoglouRunning Time:1. Size of matrix: LN;Where L = length of each sequence N = number of sequences2. Neighbors/cell: 2N – 1Therefore………………………… O(2N LN)Multidimensional DP•How do gap states generalize?•VERY badly!Require 2N – 1 states, one per combination of gapped/ungapped sequencesRunning time: O(2N  2N  LN) = O(4N LN)XY XYZ ZY YZX XZCS262 Lecture 9, Win07, BatzoglouProgressive Alignment•When evolutionary tree is known:Align closest


View Full Document

Stanford CS 262 - Phylogeny Tree Reconstruction

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Phylogeny Tree Reconstruction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Phylogeny Tree Reconstruction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Phylogeny Tree Reconstruction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?