DOC PREVIEW
CMU BSC 03711 - Lectures

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Pairwise sequence alignment (global and local)Multiple sequence pqalignmentlocalglobalSubstitution matricesDatabase searchingBLASTSequence ttitiEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionstatisticsComputational genomics…... RLSKIISMFQAHIRGYLIRKAYKRGYQARCLLK ... ... RNKHAIAVIWAFWLVQSSFRGYQAGSKARRELK ... .. GWQKRVRGWIVIVRRNFKKKRNEKLSATAZZZZZYQ ... ... MKRSQVVKQEKAARKVQKFWRGHRVQHNQR ... ... QEEVSAIIIQRAYRRYLLKQKVKILRVQSS ... RLSKIISMIQAHIRGYLIRKAYKRGYQARCLLK... RLSKIISMIQAHIRGYLIRKAYKRGYQARCLLK ..... RNKHAIAVIWAFWLVQSSFRGYQAGSKARRELK ..... GWIQKRVRGWIVIRRNFKKKRNEKLSATAZZZZZYQ .... MKRSQVVKQEKAARKIQKFWRGHRVQHNQR ... ... QEEVSAIIIQRAYRRYLLKQKVKILRVQSS ... DiscoveryModeling.. GWQKRVRGWIVIVRRNQVNQAAVTIQRWYRCQVQRRRAGFKKKRNEKLSATAZZZZZRecognitionLocal Multiple AlignmentPosition Specific Scoring Matrices (PSSMs)Position Specific Scoring Matrices (PSSMs)– Modeling, Recognition• Gibbs sampler– Discovery• Hidden Markov Models (HMMs)Discovery Modeling Recognition•Discovery, Modeling, Recognition• Can represent gaps, positional dependenciesPosition Specific Scoring MatricesGiven A[l,k] (k sequences, l positions),nij–Frequency of aa i at positionj– Propensity of aa i at position jknjiFij],[ibjiFjiP],[],[ ][log][jiPjiS–Log odds scoring matrix],[log],[2jiPjiS1n-w=1]],[[10llitSwli2Local Multiple Alignment•Position Specific Scoring Matrices (PSSMs)•Position Specific Scoring Matrices (PSSMs)– Modeling, Recognition Gibbs sampler– Discovery• Hidden Markov Models (HMMs)Discovery Modeling Recognition•Discovery, Modeling, Recognition• Can represent gaps, positional dependenciesDiscovery• Input: k sequences containing a common ungapped pattern (e g a transcription factorungapped pattern (e.g., a transcription factor binding site, a domain…)• Output: A set of k subsequences that are “most similar” to each other. • ApproachesExhaustive enumeration–Exhaustive enumeration Gibbs sampler– Expectation maximization using HMMsGibbs sampler summaryConvergence: (see optional reading for details)– Model sampling process as a Markov ChainEach state is a set ofksubsequences–Each state is a set of k subsequences– Show that• the Markov Chain has a stationary distribution• the state corresponding to the most likely pattern has high probability in that distributionIn practice, the sampler can get stuck in local optima• Randomness helps.• Run the procedure several times with different starting configurations.Gibbs sampler summaryOther considerations:– Problems could arise if a sequence has no copy of the pattern or has more than one copy– You could find a biologically meaningful pattern that is not the pattern you are looking for.– Use pseudocounts when building the PSSM to ensure all characters are represented.Black Magic (see Lawrence et al, optional reading)g( ,p g)– Pseudocounts– Selecting the window size, w– Selecting the starting configuration– Termination condition.3Local Multiple Alignment•Position Specific Scoring Matrices (PSSMs)•Position Specific Scoring Matrices (PSSMs)– Modeling, Recognition• Gibbs sampler– Discovery Hidden Markov Models (HMMs)Discovery Modeling Recognition•Discovery, Modeling, Recognition• Can represent gaps, positional dependenciesProblems with PSSMsDo not capture positional dependenciesWEIRD D0.60WEIRDWEIQHWEIRDWEIQHE1.00H0.40I1.00Q0.40R0.60W1.00Note: We never see QD or RH, only RD and QH.But, P(RH) = P(QD) = 0.24, while P(QH) = 0.16Problems with PSSMsHard to recognize pattern instances that contain indelsWETIRDD 0.8 0.8 0.8 0.8 2.4E0629060616W E T I R D E0.62.90.60.61.6H 2.0 2.0 2.0 2.0 3.0I 0.8 0.8 3.1 0.8 0.8Q 1.1 1.1 1.1 2.1 1.1R 0.8 0.8 0.8 2.8 0.8W 5.0 2.7 2.7 2.7 1.85.0+2.9+1.2+1.4+1.5 = 11W E T I R D 1.2+1.8+3.1+3.0+3.4 = 12.5W E T I R D 5.0+2.9+3.1+3.0+3.4= 18.4Problems with PSSMsVariable length motifsWETIRD Gaps can be represented by expanding Σ, but what size window should be used to score new instances of the motif?WE_IRDWETIQHWE_IRDWETIQHx x x W E T I R D x x x x x x W E I Q H x x x x x.4Problems with PSSMsDo not handle boundary detection problems wellGoal: label every element in the sequence with aGoal: label every element in the sequence with a zero (not in pattern) or a one (in pattern)PSSMx x x x x x x x x x x y y y y y y y x x x x x x x x0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0Examples of boundary detection problems•Recognition of regulatory motifs•Recognition of regulatory motifs• Recognition of protein domains• Intron/exon boundaries• Gene boundaries•Transmembrane regions•Transmembrane regions• Secondary structures (α helices, β sheets)Plan •Review Markov chains•Review Markov chains• Extend to Hidden Markov Models– Boundary detection– Scoring sequences•HMM constructionHMM construction• Biological applications: revisit gaps and dependencies.Markov chains• States: S1, S2, ... SN•States visited: q0, q1, …., qt, qt+1, …Saes s ed q0,q1,,qt,qt+1,• Initial distribution of states: π(i) = P(q0 =Si)• Transition probabilities: aij= P(qt =Sj |qt-1 =Si)Questions we can ask:Questions we can ask:What is the probability of being in a particular state at a particular time?What is the probability of seeing a particular sequence of states?5An example: transmembrane regionsExtracellularMembraneCytosolTend to be hydrophobicModel each amino acid as hydrophobic (H) or hydrophilic (L)→ A peptide sequence can be represented as a sequence of H’s and Ls. MLVKRFWKCE….  HHHLLHLHHLHL...An example: transmembrane regionsExtracellularMembraneCytosolTend to be hydrophobicQuestions to ask:which subsequences correspond to transmembrane regions?HHHLLHLHHLHL...An example: transmembrane regionsExtracellularMembraneCytosolTend to be hydrophobicA simpler question:is a given sequence a transmembrane sequence?HHHLLHLHHLHL...A Markov chain for recognizing transmembrane sequences0.70.30.7• States: SH, SLHL0.3003H,L• Σ = {H,L}• π(H) = 0.7, π(L) = 0.3Is a given sequence, say HHLHH, a transmembrane sequence?60.7 0.30.7π(H) = 0 7π(L) = 0 3Transmembrane model:HL0.3P(HHLHH) = 0.7 x 0.7 x 0.3 x 0.7 x 0.7 = 0.072π(H) = 0.7, π(L) = 0.3Is it a transmembrane protein?Problem: need a threshold, threshold must be length dependentHL0.7 0.30.7Transmembrane model:HL0.5 0.50.5Null model:HL0.3π(H) = 0.7, π(L) = 0.3HL0.5π(H) = 0.5, π(L) = 0.5P(HHLHH | TM)


View Full Document

CMU BSC 03711 - Lectures

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Lectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?