Stanford CS 262 - Phylogeny Tree Reconstruction - D2000377

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Phylogeny Tree Reconstruction

DOC PREVIEW

Stanford CS 262 - Phylogeny Tree Reconstruction

School name Stanford University

Course Cs 262- Computational Genomics

Pages 43

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Phylogeny Tree ReconstructionInferring PhylogeniesNeighbor-JoiningAlgorithm: Neighbor-joiningParsimony – What if we don’t have distancesExample: Parsimony cost of one columnParsimony ScoringExampleTraceback to find ancestral nucleotidesSlide 10Probabilistic MethodsSlide 12Felsenstein’s Likelihood AlgorithmSlide 14Current popular methodsMultiple Sequence AlignmentsProtein PhylogeniesProtein Phylogenies – ExampleSlide 19Slide 20DefinitionScoring Function: Sum Of PairsSum Of Pairs (cont’d)A Profile RepresentationSlide 25Multidimensional DPSlide 27Slide 28Slide 29Progressive AlignmentSlide 31Slide 32Heuristics to improve alignmentsIterative RefinementSlide 35Slide 36Slide 37A* for Multiple AlignmentsSlide 39Slide 40ConsistencySlide 42Some ResourcesPhylogeny Tree Reconstruction1432514235CS262 Lecture 13, Win06, BatzoglouInferring PhylogeniesTrees can be inferred by several criteria:Morphology of the organisms•Can lead to mistakes!Sequence comparisonExample:Orc: ACAGTGACGCCCCAAACGTElf: ACAGTGACGCTACAAACGTDwarf: CCTGTGACGTAACAAACGAHobbit: CCTGTGACGTAGCAAACGAHuman: CCTGTGACGTAGCAAACGACS262 Lecture 13, Win06, BatzoglouNeighbor-Joining•Guaranteed to produce the correct tree if distance is additive•May produce a good tree even when distance is not additiveStep 1: Finding neighboring leavesDefineDij = dij – (ri + rj)Where 1 ri = –––––k dik |L| - 2Claim: The above “magic trick” ensures that Dij is minimal iff i, j are neighborsProof: Very technical, please read Durbin et al.!Four-Point Condition: If and only if for every set of four leaves I, j, k, l, two of the distances dij + dkl, dik + djl, dil + djkare equal and larger than the third, the distances are additive and we can reconstruct the tree12430.10.1 0.10.40.4CS262 Lecture 13, Win06, BatzoglouAlgorithm: Neighbor-joiningInitialization:Define T to be the set of leaf nodes, one per sequenceLet L = TIteration:Pick i, j s.t. Dij is minimalDefine a new node k, and set dkm = ½ (dim + djm – dij) for all m  LAdd k to T, with edges of lengths dik = ½ (dij + ri – rj), djk = dij – dikRemove i, j from L; Add k to LTermination:When L consists of two nodes, i, j, and the edge between them of length dijCS262 Lecture 13, Win06, BatzoglouParsimony – What if we don’t have distances•One of the most popular methods:GIVEN multiple alignmentFIND tree & history of substitutions explaining alignmentIdea: Find the tree that explains the observed sequences with a minimal number of substitutionsTwo computational subproblems:1. Find the parsimony cost of a given tree (easy)2. Search through all tree topologies (hard)CS262 Lecture 13, Win06, BatzoglouExample: Parsimony cost of one columnABAA{A, B}CostC+=1{A}Final cost C = 1{A}{A}{B}{A}{A}ABAACS262 Lecture 13, Win06, BatzoglouParsimony Scoring Given a tree, and an alignment column uLabel internal nodes to minimize the number of required substitutionsInitialization:Set cost C = 0; node k = 2N – 1 (last leaf)Iteration:If k is a leaf, set Rk = { xk[u] } // Rk is simply the character of kth speciesIf k is not a leaf,Let i, j be the daughter nodes;Set Rk = Ri  Rj if intersection is nonemptySet Rk = Ri  Rj, and C += 1, if intersection is emptyTermination:Minimal cost of tree for column u, = CCS262 Lecture 13, Win06, BatzoglouExampleA A A B{A} {A} {A} {B}B A BA{A} {B} {A} {B}{A}{A}{A}{A,B}{A,B}{B}{B}CS262 Lecture 13, Win06, BatzoglouTraceback:1. Choose an arbitrary nucleotide from R2N – 1 for the root 2. Having chosen nucleotide r for parent k, If r  Ri choose r for daughter iElse, choose arbitrary nucleotide from RiEasy to see that this traceback produces some assignment of cost CTraceback to find ancestral nucleotidesCS262 Lecture 13, Win06, BatzoglouExampleABAB{A, B}{A, B}{A}{A}{B}{A}{B}ABABAA AxxABABAB AxxABABBB BxxAdmissible with TracebackStill optimal, but inadmissible with TracebackCS262 Lecture 13, Win06, BatzoglouProbabilistic MethodsA more refined measure of evolution along a tree than parsimonyP(x1, x2, xroot | t1, t2) = P(xroot) P(x1 | t1, xroot) P(x2 | t2, xroot)If we use Jukes-Cantor, for example, and x1 = xroot = A, x2 = C, t1 = t2 = 1,= pA¼(1 + 3e-4α) ¼(1 – e-4α) = (¼)3(1 + 3e-4α)(1 – e-4α) x1t2xroott1x2CS262 Lecture 13, Win06, BatzoglouProbabilistic Methods•If we know all internal labels xu, P(x1, x2, …, xN, xN+1, …, x2N-1 | T, t) = P(xroot)jrootP(xj | xparent(j), tj, parent(j))•Usually we don’t know the internal labels, thereforeP(x1, x2, …, xN | T, t) = xN+1 xN+2 …  x2N-1 P(x1, x2, …, x2N-1 | T, t) xrootx1x2xNxuCS262 Lecture 13, Win06, BatzoglouFelsenstein’s Likelihood AlgorithmTo calculate P(x1, x2, …, xN | T, t)Initialization:Set k = 2N – 1Iteration: Compute P(Lk | a) for all a  If k is a leaf node:Set P(Lk | a) = 1(a = xk)If k is not a leaf node:1. Compute P(Li | b), P(Lj | b) for all b, for daughter nodes i, j2. Set P(Lk | a) = b,c P(b | a, ti) P(Li | b) P(c | a, tj) P(Lj | c)Termination:Likelihood at this column = P(x1, x2, …, xN | T, t) = aP(L2N-1 | a)P(a)Let P(Lk | a) denote the prob. of all the leaves below node k, given that the residue at k is aCS262 Lecture 13, Win06, BatzoglouProbabilistic MethodsGiven M (ungapped) alignment columns of N sequences, •Define likelihood of a tree:L(T, t) = P(Data | T, t) = m=1…M P(x1m, …, xnm, T, t)Maximum Likelihood Reconstruction:•Given data X = (xij), find a topology T and length vector t that maximize likelihood L(T, t)CS262 Lecture 13, Win06, BatzoglouCurrent popular methodsHUNDREDS of programs available!http://evolution.genetics.washington.edu/phylip/software.html#methodsSome recommended programs:•Discrete—Parsimony-basedRec-1-DCM3 http://www.cs.utexas.edu/users/tandy/mp.htmlTandy Warnow and colleagues•ProbabilisticSEMPHYhttp://www.cs.huji.ac.il/labs/compbio/semphy/Nir Friedman and colleaguesCS262 Lecture 13, Win06, BatzoglouMultiple Sequence Multiple Sequence AlignmentsAlignmentsCS262 Lecture 13, Win06, BatzoglouProtein Phylogenies•Proteins evolve by both duplication and species divergenceCS262 Lecture 13, Win06, BatzoglouProtein Phylogenies – ExampleCS262 Lecture 13, Win06, BatzoglouCS262 Lecture 13, Win06, BatzoglouCS262 Lecture 13, Win06, BatzoglouDefinition•Given N sequences x1, x2,…, xN:Insert gaps (-) in each sequence xi, such that•All sequences have the same length L•Score of the global map is maximum•A faint similarity between two sequences becomes

View Full Document