CMU BSC 03711 - Lectures - D488895

Home> Schools> Carnegie Mellon University> Biological Sciences (BSC) > BSC 03711> Lectures

DOC PREVIEW

CMU BSC 03711 - Lectures

School name Carnegie Mellon University

Course Bsc 03711- Computational Molecular Biology and Genomics

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Pairwise sequence alignment (global and local)Multiple sequence pqalignmentlocalglobalSubstitution matricesDatabase searchingBLASTSequence ttitiEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionstatisticsComputational genomics…... RLSKIISMFQAHIRGYLIRKAYKRGYQARCLLK ... ... RNKHAIAVIWAFWLVQSSFRGYQAGSKARRELK ... .. GWQKRVRGWIVIVRRNFKKKRNEKLSATAZZZZZYQ ... ... MKRSQVVKQEKAARKVQKFWRGHRVQHNQR ... ... QEEVSAIIIQRAYRRYLLKQKVKILRVQSS ... RLSKIISMIQAHIRGYLIRKAYKRGYQARCLLK... RLSKIISMIQAHIRGYLIRKAYKRGYQARCLLK ..... RNKHAIAVIWAFWLVQSSFRGYQAGSKARRELK ..... GWIQKRVRGWIVIRRNFKKKRNEKLSATAZZZZZYQ .... MKRSQVVKQEKAARKIQKFWRGHRVQHNQR ... ... QEEVSAIIIQRAYRRYLLKQKVKILRVQSS ... DiscoveryModeling.. GWQKRVRGWIVIVRRNQVNQAAVTIQRWYRCQVQRRRAGFKKKRNEKLSATAZZZZZRecognitionLocal Multiple AlignmentPosition Specific Scoring Matrices (PSSMs)Position Specific Scoring Matrices (PSSMs)– Modeling, Recognition• Gibbs sampler– Discovery• Hidden Markov Models (HMMs)Discovery Modeling Recognition•Discovery, Modeling, Recognition• Can represent gaps, positional dependenciesPosition Specific Scoring MatricesGiven A[l,k] (k sequences, l positions),nij–Frequency of aa i at positionj– Propensity of aa i at position jknjiFij],[ibjiFjiP],[],[ ][log][jiPjiS–Log odds scoring matrix],[log],[2jiPjiS1n-w=1]],[[10llitSwli2Local Multiple Alignment•Position Specific Scoring Matrices (PSSMs)•Position Specific Scoring Matrices (PSSMs)– Modeling, Recognition Gibbs sampler– Discovery• Hidden Markov Models (HMMs)Discovery Modeling Recognition•Discovery, Modeling, Recognition• Can represent gaps, positional dependenciesDiscovery• Input: k sequences containing a common ungapped pattern (e g a transcription factorungapped pattern (e.g., a transcription factor binding site, a domain…)• Output: A set of k subsequences that are “most similar” to each other. • ApproachesExhaustive enumeration–Exhaustive enumeration Gibbs sampler– Expectation maximization using HMMsGibbs sampler summaryConvergence: (see optional reading for details)– Model sampling process as a Markov ChainEach state is a set ofksubsequences–Each state is a set of k subsequences– Show that• the Markov Chain has a stationary distribution• the state corresponding to the most likely pattern has high probability in that distributionIn practice, the sampler can get stuck in local optima• Randomness helps.• Run the procedure several times with different starting configurations.Gibbs sampler summaryOther considerations:– Problems could arise if a sequence has no copy of the pattern or has more than one copy– You could find a biologically meaningful pattern that is not the pattern you are looking for.– Use pseudocounts when building the PSSM to ensure all characters are represented.Black Magic (see Lawrence et al, optional reading)g( ,p g)– Pseudocounts– Selecting the window size, w– Selecting the starting configuration– Termination condition.3Local Multiple Alignment•Position Specific Scoring Matrices (PSSMs)•Position Specific Scoring Matrices (PSSMs)– Modeling, Recognition• Gibbs sampler– Discovery Hidden Markov Models (HMMs)Discovery Modeling Recognition•Discovery, Modeling, Recognition• Can represent gaps, positional dependenciesProblems with PSSMsDo not capture positional dependenciesWEIRD D0.60WEIRDWEIQHWEIRDWEIQHE1.00H0.40I1.00Q0.40R0.60W1.00Note: We never see QD or RH, only RD and QH.But, P(RH) = P(QD) = 0.24, while P(QH) = 0.16Problems with PSSMsHard to recognize pattern instances that contain indelsWETIRDD 0.8 0.8 0.8 0.8 2.4E0629060616W E T I R D E0.62.90.60.61.6H 2.0 2.0 2.0 2.0 3.0I 0.8 0.8 3.1 0.8 0.8Q 1.1 1.1 1.1 2.1 1.1R 0.8 0.8 0.8 2.8 0.8W 5.0 2.7 2.7 2.7 1.85.0+2.9+1.2+1.4+1.5 = 11W E T I R D 1.2+1.8+3.1+3.0+3.4 = 12.5W E T I R D 5.0+2.9+3.1+3.0+3.4= 18.4Problems with PSSMsVariable length motifsWETIRD Gaps can be represented by expanding Σ, but what size window should be used to score new instances of the motif?WE_IRDWETIQHWE_IRDWETIQHx x x W E T I R D x x x x x x W E I Q H x x x x x.4Problems with PSSMsDo not handle boundary detection problems wellGoal: label every element in the sequence with aGoal: label every element in the sequence with a zero (not in pattern) or a one (in pattern)PSSMx x x x x x x x x x x y y y y y y y x x x x x x x x0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0Examples of boundary detection problems•Recognition of regulatory motifs•Recognition of regulatory motifs• Recognition of protein domains• Intron/exon boundaries• Gene boundaries•Transmembrane regions•Transmembrane regions• Secondary structures (α helices, β sheets)Plan •Review Markov chains•Review Markov chains• Extend to Hidden Markov Models– Boundary detection– Scoring sequences•HMM constructionHMM construction• Biological applications: revisit gaps and dependencies.Markov chains• States: S1, S2, ... SN•States visited: q0, q1, …., qt, qt+1, …Saes s ed q0,q1,,qt,qt+1,• Initial distribution of states: π(i) = P(q0 =Si)• Transition probabilities: aij= P(qt =Sj |qt-1 =Si)Questions we can ask:Questions we can ask:What is the probability of being in a particular state at a particular time?What is the probability of seeing a particular sequence of states?5An example: transmembrane regionsExtracellularMembraneCytosolTend to be hydrophobicModel each amino acid as hydrophobic (H) or hydrophilic (L)→ A peptide sequence can be represented as a sequence of H’s and Ls. MLVKRFWKCE….  HHHLLHLHHLHL...An example: transmembrane regionsExtracellularMembraneCytosolTend to be hydrophobicQuestions to ask:which subsequences correspond to transmembrane regions?HHHLLHLHHLHL...An example: transmembrane regionsExtracellularMembraneCytosolTend to be hydrophobicA simpler question:is a given sequence a transmembrane sequence?HHHLLHLHHLHL...A Markov chain for recognizing transmembrane sequences0.70.30.7• States: SH, SLHL0.3003H,L• Σ = {H,L}• π(H) = 0.7, π(L) = 0.3Is a given sequence, say HHLHH, a transmembrane sequence?60.7 0.30.7π(H) = 0 7π(L) = 0 3Transmembrane model:HL0.3P(HHLHH) = 0.7 x 0.7 x 0.3 x 0.7 x 0.7 = 0.072π(H) = 0.7, π(L) = 0.3Is it a transmembrane protein?Problem: need a threshold, threshold must be length dependentHL0.7 0.30.7Transmembrane model:HL0.5 0.50.5Null model:HL0.3π(H) = 0.7, π(L) = 0.3HL0.5π(H) = 0.5, π(L) = 0.5P(HHLHH | TM)

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

CMU BSC 03711 - Lectures

Sign up for free to view:

Please select your school