DOC PREVIEW
CMU BSC 03711 - Lecture

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Calculating midterm scores511/495• Homework: Average top two scores• Total: (0.6*midterm + 0.4* homework)/100711/856HkAtt•Homework: Average top two scores• Total: (0.6*midterm + 0.4* homework + .1*Lit)/110Midterm GradesPairwise sequence alignment (global and local)Multiple sequence pqalignmentlocalglobalSubstitution matricesDatabase searchingBLASTSequence ttitiEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionstatisticsComputational genomics…Likelihood of MSA: ...TCAGG......TGTCG...kijx5x6Ti...TGACG......TCCGA...CAATx1x2x3x4),,|(),,|(1xlTsitePxlTMSAPikai1aAssumptions:Sites are independent: score each site separatelyLineages are independent (Markov property): compute each branch separately2Maximum Likelihood Estimation for Phylogeny ReconstructionNote we need to consider• All sites: O(n)• All trees: O(Trooted(k))•All combinations of internal labels: O(|Σ|k)(||• A branch lengths: O(k) branchesBranch lengths are estimated numericallyMaximum Likelihood Estimation for Phylogeny Reconstruction•Computationally intensiveComputationally intensive• Consistent (more data, better estimation)• If evolutionary model is a reversible Markov chain (e.g., JC), then the MLE distance matrix converges to additive. Neighbor Joining is a consistent methodFarach and Kannan, 96• Note that parsimony is not consistent.Selecting data for tree reconstruction• For reconstructing recent events, use DNA sequences• For reconstructing distant events, use amino acid sequences• Select sequences that – Are present in all taxa– Contain a conserved region– Exhibit variation within that region– e.g., Ribosomal (16sRNA) genes were used to reconstruct th t f lif Th d d t i llthe tree of life. These genes encode products use in all organisms from bacteria to mammals.• Pitfalls: duplicated genes, horizontal gene transfer, mosaic genes.Comparison of Phylogeny Reconstruction Methods• Parsimony Sl ti d i t ib l–Selection dominates, e.g., ribosomal genes– Exhaustive or heuristic search, branch and bound• Distance– Neutral mutation dominates, e.g., immunoglobulin sequences– Exhaustive or heuristic search, greedy methods. – Neighbor Joining finds correct tree in quadratic time if data is additive. –UPGMA finds correct tree in quadratic time if data is ultrametric.• Maximum Likelihood– Neutral mutation dominates, e.g., immunoglobulin sequences– Exhaustive or heuristic search3Parsimony Distance Max LikelihoodData Character Distance CharacterNP-complete Yes Yes YespTopology Yes Yes YesBranch lengths Yes Yes ProbAncestral states Yes No ProbDNA Yes Yes YesAmino acids Yes Yes Very slowConsistent No Yes YesModel of mutational changeNo Yes YesBootstrapping, Branches and Partitions• Every edge partitions a tree into two ftHA2CA1CA2groups of taxaHA1MA1FA(CA2HA2)(CA1MA1HA1FA)Bootstrapping, Branches and Partitions• Every edge partitions a tree into two ftHA2CA1CA2groups of taxaHA1MA1FA(MA1HA1) (CA1CA2HA2FA)Bootstrapping, Branches and Partitions• These two trees are different, but they htitiCA1CA2share a partitionCA1CA2(MA1HA1CA1FA) (CA2HA2)(MA1HA1CA1FA) (CA2HA2)HA2HA1MA1FAHA2HA1MA1FA4• Neither of these partitions exist in the th tBootstrapping, Branches and PartitionsCA1CA2other treeCA1CA2(MA1CA1) (CA2HA2FA HA1)(MA1HA1CA1) (CA2HA2FA)HA2HA1MA1FAHA2HA1MA1FABootstrapping a gene tree• For i = 1 to N–Construct MSA’ by sampling columns from the ypgoriginal MSA with replacement– Construct a new tree, t’, from MSA’– Tabulate the partitions in t’.• For every partition, p, in the original tree, score(p)= (the number observations of p)/NHA2CA1HA1MA1CA2FA1009957Pairwise sequence alignment (global and local)Multiple sequence pqalignmentlocalglobalSubstitution matricesDatabase searchingBLASTSequence ttitiEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionstatisticsComputational genomics…Applications of Local MSAConserved patterns in biological sequencesExample: Transcription factor binding sitesSP ...gcttt AATTTTCACTATATACTATAA cgatt... ST ...cagat ATAAATGATATAGTGGTTATA gttaa...ST ...atctt TTTTATTATTAAATCGTATTA gcagc... EC ...aggct ATAAATGATATAGTGGTTATA gttag...EC ...acctt TTTTATTATTAAATCGTATTA gtcac...VC ...ttata ACTAATAATTATAAAATATGT gtgtc...YP ...gctga TGAAATGATATAATCGTTATA taaga... …agcgagcctgagcactcgaggcatctctgcacattcagcatgggatgggcctcctgtccctgtatgcgcctgatga…5polymerasetfintronspromotortranscription factor binding sitesSome known binding site motifsApplications of Local MSA Conserved patterns in biological sequencesExample: Protein domainsExample: Protein domains Fold independentlyCarry out specific functions Found in diverse contextsConserved in evolutionInsulin receptorFN3RLkinaseFurin likeRLProtein Tyrosine KinasesAdapted from Robinson et al., 2000Protein domain databasesConserved Domain Database (CDD)Conserved Domain Database (CDD)Representation: Position specific scoring matrices (PSSMs)Structurally corrected local MSAs CDART: Conserved Domain Architecture Retrieval Tool PFAM, SMARTRepresentation: Hidden Markov Models (HMM’s)C t d l l MSA’Curated local MSA’sMore: see Mount, Table 9.56hdiPax structurehttp://www.gene-regulation.com/info/pax.htmlpaired domainhomeodomainPax domain architecturePairwise sequence alignment (global and local)Multiple sequence pqalignmentlocalglobalDiscovery: identifying conserved patterns in multiple sequencesyyg p pqModeling: Constructing probabilistic models of local MSA’sRecognition: finding new instances of known patterns (using those models)... RLSKIISMFQAHIRGYLIRKAYKRGYQARCLLK ... ... RNKHAIAVIWAFWLVQSSFRGYQAGSKARRELK ... .. GWQKRVRGWIVIVRRNFKKKRNEKLSATAZZZZZYQ ... ... MKRSQVVKQEKAARKVQKFWRGHRVQHNQR ... ... QEEVSAIIIQRAYRRYLLKQKVKILRVQSS ... RLSKIISMIQAHIRGYLIRKAYKRGYQARCLLK... RLSKIISMIQAHIRGYLIRKAYKRGYQARCLLK ..... RNKHAIAVIWAFWLVQSSFRGYQAGSKARRELK ..... GWIQKRVRGWIVIRRNFKKKRNEKLSATAZZZZZYQ .... MKRSQVVKQEKAARKIQKFWRGHRVQHNQR ... ... QEEVSAIIIQRAYRRYLLKQKVKILRVQSS ... DiscoveryModeling.. GWQKRVRGWIVIVRRNQVNQAAVTIQRWYRCQVQRRRAGFKKKRNEKLSATAZZZZZRecognitionLocal Multiple Sequence AlignmentProbabilistic Framework• Discoveryy– Given multiple sequences, often unaligned, find a conserved pattern (e.g., the Pax domain)• Representation– Given a local MSA for the Pax domain, construct probabilistic model•Recognition (using model)•Recognition (using model)– Given a new


View Full Document

CMU BSC 03711 - Lecture

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?