DOC PREVIEW
CMU BSC 03711 - lecture

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1• Today:– Introduction to projects– Phylogeny reconstruction: Maximum parsimony• Tuesday– PS2 is due– Phylogeny reconstruction:Distance-based methodsPairwise sequence alignment (global and local)Multiple sequence alignmentlocalglobalSubstitution matricesDatabase searchingBLASTEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionSequence statisticsComputational genomics…Phylogeny reconstructionGiven – Multiple sequence alignment– Model of sequence evolutionfind the (binary) tree that is the best explains the data with respect to the model.…atgcaaggagtcgcagagc……atgcgaggtctcgtagtgt……atgggaggtctcccagtgt……atgcgacgtcacgtattgg……atgtgtggtctggcagtga……atgcgacctctcggagaat…ModelFinding the optimal treeGiven k taxa, – Consider all trees with k leaves– Score each tree with respect to chosen evolutionary model.– Select highest scoring tree(s)Phylogeny reconstruction is NP-complete:Except in special cases when the data obeys specific constraints, the only way to find the best tree is to consider all trees.2Finding the optimal treeGiven k taxa, ¾ Consider all trees with k leaves– Score each tree with respect to chosen evolutionary model.– Select highest scoring tree(s)How many trees are there?How many unrooted trees with k leaves?• Three taxa• Four Taxa•Five taxak E(k) T(k)3 3 14 5 35 7 15……Number of unrooted trees for ktaxa)!3(2)!52()()32()1()1()(322)1()(313−−=−=−−=−=+−=−−=∏kkkTikTkEkTkkEkEkkiNumber of unrooted binary trees13151052,027,0252.2 x 10202.8 x 10741 x 101074The number of trees gets big fastNumber of leaves34561020505003Number of unrooted binary trees...2.2 x 10202.8 x 10741 x 101074How big is that?Number of leaves...2050500Age of the universe (seconds): 4.42 x 1017Diameter of the universe: 2.70 x 1010Number of stars in the universe: 1022How do you find the optimal tree?1. Exhaustive search (<12 taxa)(Phylogeny reconstruction isNP-complete.)How do you find the optimal tree?Typical kTimeResultMethod12T(k)Optimal solutionExhaustive searchHow do you find the optimal tree?L = {T3}, C = infinityFor i = 3 to k {For each tree, t, in L{If Score(t) > C, prune searchIf Score(t) < C, C = Score(t).For every edge, e, in t {t’= t plus a new edge at eNewL = NewL U {t’}}}L = NewL}2. Branch-and-bound (<18 taxa)Score is non-decreasing as you add edges4How do you find the optimal tree?Typical kTimeResultMethod18≤T(k)Optimal solutionBranch and bound12T(k)Optimal solutionExhaustive search3. Heuristic searchSearch for optimal trees by finding good trees and then rearranging them in the hopes of finding an even better treeHow do you find a pretty good tree?Heuristic search“Treespace”Suboptimal island of treesGlobal optimumStarting treesBranch swappingNearest-neighbor interchange (NNI)5How do you find the optimal tree?You chooseYou chooseSuboptimal solutionHeuristic searchTypical kTimeResultMethod18≤T(k)Optimal solutionBranch and bound12T(k)Optimal solutionExhaustive searchFinding the optimal treeGiven k taxa, – Consider all trees with k leaves¾Score each tree with respect to chosen evolutionary model.– Select highest scoring tree(s)Criteria for evaluating which tree best fits the data:• Maximum parsimony (character data)• Minimum evolution (distance data)• Maximum Likelihood (character data)Multiple Sequence Alignmentas Character Data ~~~~ALTEKQEALLKQSWEVLKQNIPAHSRLFALIIEAA…~~~MALTEKQEALLKQSWEVLKQNIPAHSRLFALILEAA…~~~MALTERQEALLKQSWEVLKQNIPGHSRLFALIIEAA…~~~~~~~~~~EALLKQSWEVLKQNIPGHSCLFALIIEAA…Each column (or site) is one character.GGAAC1 C4C3C2CSHCentipedesRSHAntsRSHMothsRSHBeesMultiple Sequence Alignmentas Distance Data Human Rabbit Pig ChickenHuman 0 3 7 9 Rabbit 0 6 8 Pig 0 6 Chicken 0 ~~~~ALTEKQEALLKQSWEVLKQNIPAHSLRLFALIIEAA…~~~MALTEKQEALLKQSWEVLKQNIPAHSLRLFALILEAA…~~~MALTERQEALLKQSWEVLKQNIPGHSLRLFALIIEAA…~~~~~~~~~~EALLKQSWEVLKQNIPGHSLCLFALIIEAA…ChickenPigRabbitHuman6Finding the optimal treeGiven k taxa, – Consider all trees with k leaves¾Score each tree with respect to chosen evolutionary model.– Select highest scoring tree(s)Criteria for evaluating which tree best fits the data:¾Maximum parsimony (character data)• Minimum evolution (distance data)• Maximum Likelihood (character data)Maximum Parsimony• Parsimony score = the minimum number of changes (mutations) needed to explain data.• Assumptions– Purifying selection dominates– Changes are rare– No multiple substitutions– Sites are independentInferring ancestral sequences and computing the parsimony scoreGiven a tree topology– Associate characters with leaves of tree– Find the optimal labeling of internal nodes– Count mutationsFinding the most parsimonious treeGiven k taxa and n characters (e.g., columns in an MSA), For each topology, t, with k leavesscore(t) = 0For each character, c /* 1≤c≤n */Find the optimal labeling of internal nodesscore(t) = score(t) + count_mutations(c)Return the tree(s) with minimum score.7Determining the parsimony score of a given treeInput:– MSA: k taxa, n columns, aka characters or “sites”.– Tree: T.– An assignment of the sequences in the MSA to the leaves of T.Output:– Score: The minimum number of mutations, over all possible ancestral sequences, required to explain the data– The ancestral sequences that minimize the score (sometimes.)(1) ACC(2) ATC(3) CCC(4) CT_Inferring ancestral sequences and computing the parsimony scoreACCATCCCCCT_(1) ACC(2) ATC(3) CCC(4) CT_Inferring ancestral sequences and computing the parsimony scoreACCATCCCCCT_ATCCTC100010001010000Parsimony score: 4(1) ACC(2) ATC(3) CCC(4) CT_Note:there can be more than one most parsimonious tree ACCATCCCCCT_ATC CTC100010001010ATCCCCACCCT_ACC ATC0101011008Determining the parsimony score of a treeFitch’s algorithm– Input: tree, leaf labels– Output: minimum number of mutations required to explain leaf labels– Does not determine the ancestral sequences!– Durbin et al., p 175.Fitch’s algorithmRoot tree arbitrarily; Global C = 0.SCORE (i)– If i is a leaf, return {label(i)}–Else • R(l) = SCORE (left(i))• R(r) = SCORE (right(i))• If R(r) ∩ R(l) = Ø– R(i) = R(r) U R(l) // No label avoids mutation– C = C+1 // Pass all labels up tree•Else– R(i) = R(r) ∩ R(l) // Choose label that avoids // mutationFinal score = CSome problems with parsimony• Not all characters are


View Full Document

CMU BSC 03711 - lecture

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?