DOC PREVIEW
CMU BSC 03711 - Lecture

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Pairwise sequence alignment (global and local)Multiple sequence alignmentlocalglobalSubstitution matricesDatabase searchingBLASTEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionSequence statisticsComputational genomics…Phylogeny reconstructionGiven – Multiple sequence alignment–Model of sequence evolutionfind the (binary) tree that is the best explains the data with respect to the model.…atgcaaggagtcgcagagc……atgcgaggtctcgtagtgt……atgggaggtctcccagtgt… …atgcgacgtcacgtattgg……atgtgtggtctggcagtga……atgcgacctctcggagaat…Model2Finding the optimal treeGiven k taxa, –Consider all trees with k leaves–Score each tree with respect to chosen optimization criterion.– Select the optimal tree(s)Tree reconstruction is NP‐complete:Except in special cases when the data obeys specific constraints, the only way to find the best tree is to consider all trees.Categories of tree reconstruction methodsParsimony Distance Maximum likelihood estimationBayesian methodsCharacter dataxxxPairwise distancesx3Character data•A character is a well‐defined feature that in a taxonomic unit can assume one out of two or more mutually exclusive character states. • Character: variable– e.g., Height, Weight, Color• State: values that character can take on– e.g., 2.39m, 14.7kg, red…Character dataPrimitive character: winglessBees Ants CentipedesMothsWingsBeesMothsAntsCentipedesWingsWingsNo wingsNo wings4Multiple Sequence Alignmentas Character Data ~~~~ALTEKQEALSWEVLKQNIPAHSRLFALIIEAA…~~~MALTEKQEALSWEVLKQNIPAHSRLFALILEAA…~~~MALTERQEALSWEVLKQNIPGHSRLFALIIEAA…~~~~~~~~~~EALSWEVLKQNIPGHSCLFALIIEAA…Each column (or site) is one character.• DNA: 4 states• Amino acids: 20 statesOther molecular features• e.g., gain and loss of intronsC1 C2 C3 C4Bees A H S RMoths A H S RAnts G HSRCentipedes G HSC10• Pairwise distances between taxa with the usual geometric properties• Most molecular data yield character states that are subsequently converted into distances. Distance data511Some molecular data can only be expressed as distances. Calculating distances from MSAs~~~~ALTEKQEALLKQSWEVLKQNIPAHSLRLFALIIEAA…~~~MALTEKQEALLKQSWEVLKQNIPAHSLRLFALILEAA…~~~MALTERQEALLKQSWEVLKQNIPGHSLRLFALIIEAA…~~~~~~~~~~EALLKQSWEVLKQNIPGHSLCLFALIIEAA… • Use the pairwise alignment between taxon i and taxon jinduced by the MSA.• Assess the amino acid changes• Correct for multiple substitutionsRabbit Pig Chicken Human 4 5 8 Rabbit 0 5 11 Pig 0 11 6Finding the optimal treeGiven k taxa, –Consider all trees with k leavesScore each tree with respect to chosen evolutionary model.– Select highest scoring tree(s)Criteria for evaluating which tree best fits the data:Maximum parsimony (character da ta)• Minimum evolution (distance data)•Maximum Likelihood (character data)Categories of tree reconstruction methodsParsimony Distance Maximum likelihood estimationBayesian methodsCharacter dataxxxPairwise distancesx7Maximum Parsimony: Nature is thriftyThe best tree requires the fewest mutations.e.g., jaws were only “invented” oncefishterrestrial animalssharksskullslampreys hagfishvertebraejawsbony skeletonstetrapodyMaximum Parsimony•Parsimony score = the minimum number of changes (mutations) needed to explain data.• Assumptions– Purifying selection dominates– Changes are rare–No multiple substitutions–Sites are independent8Finding the most parsimonious treeGiven k taxa and n characters (e.g., columns in an MSA), For each topology, t, with k leavesscore(t) = 0For each character, c /* 1≤c≤n */Find the optimal labeling of internal nodesscore(t) = score(t) + count_mutations(c)Return the tree(s) with minimum score.Finding the most parsimonious treeGiven k taxa and n characters (e.g., columns in an MSA), For each topology, t, with k leavesscore(t) = 0For each character, c /* 1≤c≤n */Find the optimal labeling of internal nodesscore(t) = score(t) + count_mutations(c)Return the tree(s) with minimum score.9Trees with four leavesWXZYWXYZYXWZFinding the most parsimonious treeGiven k taxa and n characters (e.g., columns in an MSA), For each topology, t, with k leavesscore(t) = 0For each character, c /* 1≤c≤n */Find the optimal labeling of internal nodesscore(t) = score(t) + count_mutations(c)Return the tree(s) with minimum score.10Determining the parsimony score of a giventreeInput:–MSA: k taxa, n columns, aka characters or “sites”.– Tree: T.–An assignment of the sequences in the MSA to the leaves of T.Output:–Score: The minimum number of mutations, over all possible ancestral sequences, required to explain the data–The ancestral sequences that minimize the score (sometimes.)(1) ACC(2) ATC(3) CCC(4) CT_Inferring ancestral sequences and computing the parsimony scoreACCATCCCCCT_11(1) ACC(2) ATC(3) CCC(4) CT_Inferring ancestral sequences and computing the parsimony scoreACCATCCCCCT_ATCCTC100010001010000Parsimony score: 4(1) ACC(2) ATC(3) CCC(4) CT_Note:there can be more than one most parsimonious tree ACCATCCCCCT_ATC CTC100010001010ATCCCCACCCT_ACC ATC01010110012Determining the parsimony score of a treeFitch’s algorithm– Input: tree, leaf labels– Output: minimum number of mutations required to explain leaf labels– Does not determine the ancestral sequences!– Durbin et al., p 175.Fitch’s algorithmRoot tree arbitrarily; Global C = 0.SCORE (i)–If i is a leaf, return {label(i)}–Else • R(l) = SCORE (left(i))• R(r) = SCORE (right(i))•If R(r) ∩ R(l) = Ø– R(i) = R(r) U R(l) // No label avoids mutation– C = C+1 // Pass all labels up tree•Else– R(i) = R(r) ∩ R(l) // Choose label that avoids // mutationFinal score = C13Some problems with parsimony•Not all characters are informative•Data may not be parsimonious •There may be more than one parsimonious treeX C A GY T G GZ C C TW C A TInformative sites: Columns that distinguish alternate treesWXYZWXYZWXYZ14X C A GY T G GZ C C TW C A T1 2 IInformative sitesWXYZWXYZWXYZ455Finding the most parsimonious treeGiven k taxa and n characters (e.g., columns in an MSA), For each topology, t, with k leavesscore(t) = 0For each of the n charactersFind the optimal labeling of internal nodesscore(t) = score(t) + count_mutationsNot all columns are informative!15Some problems with parsimony•Not all characters are informative•Data may not be parsimonious •There may be more than one parsimonious treeProblem: Not all


View Full Document

CMU BSC 03711 - Lecture

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?