DOC PREVIEW
UCSD CSE 182 - Protein Sequence Analysis Patterns

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

November 09! CSE 182!CSE182-L8 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene FindingNovember 09! CSE 182!QUIZ! • Question: • your ‘friend’ likes to gamble. • He tosses a coin: HEADS, he gives you a dollar. TAILS, you give him a dollar. • Usually, he uses a fair coin, but ‘once in a while’, he uses a loaded coin. • Can you say if he is cheating? • What fraction of the times does he load the coin?Regular expressions as motifs • What is a regular expression? • Given a regular expression pattern and a database, find all sequences that match the pattern. • Given a sequence as query, and a database of r.e. patterns, find all of the patterns in the sequence. • http://ca.expasy.org/prosite/ November 09! CSE 182!Fa 07! CSE182!Regular Expressions • Concise representation of a set of strings over alphabet ∑. • Described by a string over • R is a r.e. if and only if € Σ,⋅,∗,+{ }€ R = {ε} Base caseR = {σ},σ∈ ΣR = R1+ R2 Union of stringsR = R1⋅ R2 ConcatenationR = R1* 0 or more repetitionsFa 07! CSE182!Regular Expression • Q: Let ∑={A,C,E} – Is (A+C)*EEC* a regular expression? – *(A+C)? – AC*..E? • Q: When is a string s in a regular expression?!– R =(A+C)*EEC*!– Is CEEC in R?!– AEC?!– ACEE?!Fa 07! CSE182!Regular Expression & Automata  Every R.E can be expressed by an automaton (a directed graph) with the following properties: – The automaton has a start and end node – Each edge is labeled with a symbol from ∑, or ε  Suppose R is described by automaton A! S ∈ R if and only if there is a path from start to end in A, labeled with s.!Fa 07! CSE182!Examples: Regular Expression & Automata • (A+C)*EEC* C!A!C!start! end!E! E!November 09! CSE 182!Constructing automata from R.E • R = {ε} • R = {σ}, σ ∈ ∑ • R = R1 + R2 • R = R1 · R2 • R = R1*November 09! CSE 182!Matching Regular expressions • A string s belongs to R if and only if, there is a path from START to END in RA, labeled by s. • Given a regular expression R (automaton RA), and a database D, is there a string D[b..c] that matches RA (D[b..c] ∈ R) • Simpler Q: Is D[1..c] accepted by the automaton of R?November 09! CSE 182!Alg. For matching R.E. • If D[1..c] is accepted by the automaton RA – There is a path labeled D[1]…D[c] that goes from START to END in RA D[1]!D[2]! D[c]!November 09! CSE 182!Alg. For matching R.E. • If D[1..c] is accepted by the automaton RA – There is a path labeled D[1]…D[c] that goes from START to END in RA – There is a path labeled D[1]..D[c-1] from START to node u, and a path labeled D[c] from u to the END D[1] .. D[c-1]!D[c]!u!November 09! CSE 182!D.P. to match regular expression • Define: – A[u,σ] = Automaton node reached from u after reading σ – Eps(u): set of all nodes reachable from node u using epsilon transitions. – N[c] = subset of nodes reachable from START node after reading D[1..c] – Q: when is v ∈ N[c] ε!November 09! CSE 182!• Q: when is v ∈ N[c]? • A: If for some u ∈ N[c-1], w = A[u,D[c]], • v ∈ {w}+ Eps(w) D.P. to match regular expressionNovember 09! CSE 182!AlgorithmNovember 09! CSE 182!The final step • We have answered the question: – Is D[1..c] accepted by R? – Yes, if END ∈ N[c] • We need to answer – Is D[l..c] (for some l, and some c) accepted by R € D[l..c] ∈ R ⇔ D[1..c] ∈ Σ∗RNovember 09! CSE 182!Regular expressions as Protein sequence motifs A Fam(B) C-X-[DE]-X{10,12}-C-X-C--[STYLV]!C! E! F!• Problem: if there is a mis-match, the sequence is not accepted.November 09! CSE 182!Representation 2: Profiles • Profiles versus regular expressions – Regular expressions are intolerant to an occasional mis-match. – The Union operation (I+V+L) does not quantify the relative importance of I,V,L. It could be that V occurs in 80% of the family members. – Profiles capture some of these ideas.November 09! CSE 182!Profiles • Start with an alignment of strings of length m, over an alphabet A, • Build an |A| X m matrix F=(fki) • Each entry fki represents the frequency of symbol k in position i 0.71!0.14!0.14!0.28!November 09! CSE 182!Profiles • Start with an alignment of strings of length m, over an alphabet A, • Build an |A| X m matrix F=(fki) • Each entry fki represents the frequency of symbol k in position i 0.71!0.14!0.14!0.28!November 09! CSE 182!Scoring matrices • Given a sequence s, does it belong to the family described by a profile? • We align the sequence to the profile, and score it • Let S(i,j) be the score of aligning position i of the profile to residue sj • The score of an alignment is the sum of column scores. s sj iNovember 09! CSE 182!Scoring Profiles € S(i, j) = fkik∑M rk,sj[ ]k i s fki Scoring MatrixNovember 09! CSE 182!Domain analysis via profiles • Given a database of profiles of known domains/families, we can query our sequence against each of them, and choose the high scoring ones to functionally characterize our sequences. • What if the sequence matches some other sequences weakly (using BLAST), but does not match any known profile?November 09! CSE 182!Psi-BLAST idea • Iterate: – Find homologs using Blast on query – Discard very similar homologs – Align, make a profile, search with profile. – Why is this more sensitive? Seq Db!--In the next iteration, the red sequence will be thrown out. --It matches the query in non-essential residuesNovember 09! CSE 182!Psi-BLAST speed • Two time consuming steps. 1. Multiple alignment of homologs 2. Searching with Profiles. 1. Does the keyword search idea work? • Multiple alignment: – Use ungapped multiple alignments only • Pigeonhole principle again: – If profile of length m must score >= T – Then, a sub-profile of length l must score >= lT|/m – Generate all l-mers that score at least lT|/M – Search using an automatonNovember 09! CSE 182!Representation 3: HMMs • Building good profiles relies upon good alignments. – Difficult if there are gaps in the alignment. – Psi-BLAST/BLOCKS etc. work with gapless alignments. • An


View Full Document

UCSD CSE 182 - Protein Sequence Analysis Patterns

Download Protein Sequence Analysis Patterns
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Protein Sequence Analysis Patterns and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Protein Sequence Analysis Patterns 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?