DOC PREVIEW
CMU BSC 03711 - Lecture

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Pairwise sequence alignment (global and local)Multiple sequence alignmentlocalglobalSubstitution matricesDatabase searchingBLASTEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionSequence statisticsComputational genomics…Hypothesis testing using a (log) odds ratio• Observation: Data, D (6H, 2T)• What process generated this data?– Alternative hypothesis: Ha (p≠0.5)– Null hypothesis: Ho(p=0.5)• P(Ha|D): posterior probability• P(Ha): prior probability• P(D|Ha): likelihood of the data given the hypothesisHypothesis testing using a likelihood ratioLikelihood ratio:How likely is the data under the alternate hypothesis compared with the likelihood under the null hypothesis?=)|()|(0HDPHDPaP(6 heads in 8 tosses | q)P(6 heads in 8 tosses | 0.5)P (toss yields heads): Ha: q≠0.5, Ho: 0.5Note: There are ways to test a hypothesis; e.g., a p-value.Need to estimate qMaximum Likelihood EstimationWhat process generated this data?– Model with parameters: e.g., binomial with parameter p– The best estimate of q is the value that maximizes the likelihood of the data. To obtain q, solve:()knkknqqqknP−−= )1(),,(()0))1((2626=−dqqqd0)|(=dqHDdP75.0=q2Hypothesis testing using a (log) odds ratioLikelihood ratio:How likely is the data under the alternate hypothesis compared with the likelihood under the null hypothesis?=)|()|(0HDPHDPaP(6 heads in 8 tosses | 0.75)P(6 heads in 8 tosses | 0.5)Observing 6 heads in 8 coin tosses is 2.85 times as likely if q = 0.75 than if the coin is fair.Note: the sample size is very small!(0.75)6(0.25)2(0.5)6(0.5)2= 2.85Note:• The estimate improves as the sample size increases. A method is consistent if • For mathematical convenience we may use the log likelihood ratio: • In general, the probability distribution is unknown. Select a model and maximize the likelihood with respect to that model. Results can vary with the choice of model• We estimated a parameter and determined the likelihood in a single, unified process.qqn=∞→ˆlim)|()|(log0HDPHDPaMaximum Likelihood Estimation for Phylogeny ReconstructionData: Multiple sequence alignment, n sites, k taxaModel: sequence evolution, e.g. Jukes CantorParameters:Internal labels, l= (l1,l2…lj)Branch lengths, x= (x1,x2…xj)Given a topology, TSelect l, x such that P(MSA | T, l, x) is maximumMaximum Likelihood Estimation for Phylogeny ReconstructionAssumptions:Sites are independent: score each site separatelyLineages are independent (Markov property): compute each branch separately)|()|( TsitePTMSAPii∏=)|()|(jjiixsitePTsiteP∏=3Maximum Likelihood Estimation for Phylogeny ReconstructionGiven a topology, T, Select l, x such thatis maximum∏∏∏=ijhhijxlTsitePHMSAP ),,|()|(Probabilities given by, e.g., Jukes Cantor model:ATr= {C, G, A, T}x1x2P(site | T) = P(r=A)P(x1)AAP(x2)AT+ P(r=T)P(x1)TAP(x2)TT + 2P(r=C)P(x1)CAP(x2)CT+ 2P(r=G)P(x1)GAP(x2)GTP(xi)CC= ( 1/4 + 3/4 e-4xi), P(xi)CG= ( 1/4 -1/4 e-4xi), etc.Note this is a sum, not a productMaximum Likelihood Estimation for Phylogeny Reconstruction• Consistent (more data, better estimation)• Computationally intensive– Consider T(k) trees– For each internal node, |Σ|k labels. MLE used more often for DNA than for protein sequences– Branch lengths are typically determined numerically.• If evolutionary model is a reversible Markov chain then the MLE distance matrix converges to additive.– Î Neighbor Joining is a consistent method• Note that parsimony is not consistent.Selecting data for tree reconstruction• For reconstructing recent events, use DNA sequences• For reconstructing distant events, use amino acid sequences• Select sequences that – Are present in all taxa– Contain a conserved region– Exhibit variation within that region– e.g., Ribosomal (16sRNA) genes were used to reconstruct the tree of life. These genes encode products use in all organisms from bacteria to mammals.• Pitfalls: duplicated genes, horizontal gene transfer, mosaic genes.4Comparison of Phylogeny Reconstruction Methods• Parsimony – Selection dominates, e.g., ribosomal genes– Exhaustive or heuristic search, branch and bound• Distance– Neutral mutation dominates, e.g., immunoglobulin sequences– Exhaustive or heuristic search, greedy methods. – Neighbor Joining finds correct tree in quadratic time if data is additive. – UPGMA finds correct tree in quadratic time if data is ultrametric.• Maximum Likelihood– Neutral mutation dominates, e.g., immunoglobulin sequences– Exhaustive or heuristic searchCharacterDistanceCharacterDataNoNoYesSelective pressureYesYesNoConsistentYesYesYesDNAVery slowYesYesAmino acidsYesYesYesTopologyProbYesYesBranch lengthsProbNoYesAncestral statesYesYesNoModel of mutational changeYesYesYesNP-completeMax LikelihoodDistanceParsimonyPairwise sequence alignment (global and local)Multiple sequence alignmentlocalglobalSubstitution matricesDatabase searchingBLASTEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionSequence statisticsComputational genomics…Applications of Local MSAConserved patterns in biological sequencesExample: Transcription factor binding sitesSP ...gcttt AATTTTCACTATATACTATAA cgatt... ST ...cagat ATAAATGATATAGTGGTTATA gttaa...ST ...atctt TTTTATTATTAAATCGTATTA gcagc... EC ...aggct ATAAATGATATAGTGGTTATA gttag...EC ...acctt TTTTATTATTAAATCGTATTA gtcac...VC ...ttata ACTAATAATTATAAAATATGT gtgtc...YP ...gctga TGAAATGATATAATCGTTATA taaga... …agcgagcctgagcactcgaggcatctctgcacattcagcatgggatgggcctcctgtccctgtatgcgcctgatga…5intronspromotortranscription factor binding sitespolymerasetfSome known binding site motifsApplications of Local MSA Conserved patterns in biological sequencesExample: Protein domains Fold independentlyCarry out specific functions Found in diverse contextsConserved in evolutionInsulin receptorFN3RLkinaseFurin likeRLProtein Tyrosine KinasesAdapted from Robinson et al., 2000Protein domain databasesConserved Domain Database (CDD)Representation: Position specific scoring matrices (PSSMs)Structurally corrected local MSAsCDART: Conserved Domain Architecture Retrieval Tool PFAM, SMARTRepresentation: Hidden Markov Models (HMM’s)Curated local MSA’sMore: see Mount, Table 9.56Multi-domain protein example:PAX gene family• Developmental regulatory genes that encode transcription factors• Contain a DNA binding domain• Early expressed during


View Full Document

CMU BSC 03711 - Lecture

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?