DOC PREVIEW
UCSD CSE 182 - Blast & variants I

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSE 182-L2:Blast & variants I Dynamic ProgrammingSearching Sequence databasesQuery:Querying with BlastBlast OutputSlide 6Slide 7The technological questionThe biology questionComputing alignmentsOptimum scoring alignments, and score of optimum alignmentComputing Optimal Alignment scoreOptimum prefix alignments3 possibilities for rightmost cellOptimal score of an alignmentOptimal alignment scoreAn O(nm) algorithm for score computationInitializationA tableaux approachAn ExampleAlignment TableSlide 22Computing Optimum AlignmentComputing Optimal AlignmentsRetrieving Opt.AlignmentSlide 26Algorithm to retrieve optimal alignmentSummaryGlobal versus Local AlignmentBlast Outputs Local AlignmentLocal AlignmentSlide 32Slide 33Local Alignment Trick (Smith-Waterman algorithm)Generalizing Gap CostUsing affine gap penaltiesAffine gap penaltiesO(nm) solution for affine gap costsBlast variantsThe genetic codeSlide 41Slide 42FA05 CSE182CSE 182-L2:Blast & variants IDynamic Programmingwww.cse.ucsd.edu/classes/fa05/cse182www.cse.ucsd.edu/classes/fa05/cse182www.cse.ucsd.edu/~vbafnawww.cse.ucsd.edu/~vbafnaFA05 CSE182Searching Sequence databaseshttp://www.ncbi.nlm.nih.gov/BLAST/FA05 CSE182Query:>gi|26339572|dbj|BAC33457.1| unnamed protein product [Mus musculus]MSSTKLEDSLSRRNWSSASELNETQEPFLNPTDYDDEEFLRYLWREYLHPKEYEWVLIAGYIIVFVVALIGNVLVCVAVWKNHHMRTVTNYFIVNLSLADVLVTITCLPATLVVDITETWFFGQSLCKVIPYLQTVSVSVSVLTLSCIALDRWYAICHPLMFKSTAKRARNSIVVIWIVSCIIMIPQAIVMECSSMLPGLANKTTLFTVCDEHWGGEVYPKMYHICFFLVTYMAPLCLMILAYLQIFRKLWCRQIPGTSSVVQRKWKQQQPVSQPRGSGQQSKARISAVAAEIKQIRARRKTARMLMVVLLVFAICYLPISILNVLKRVFGMFTHTEDRETVYAWFTFSHWLVYANSAANPIIYNFLSGKFREEFKAAFSCCLGVHHRQGDRLARGRTSTESRKSLTTQISNFDNVSKLSEHVVLTSISTLPAANGAGPLQNWYLQQGVPSSLLSTWLEV •What is the function of this sequence?•Is there a human homolog?•Which organelle does it work in? (Secreted/membrane bound)•Idea: Search a database of known proteins to see if you can find similar sequences which have a known functionFA05 CSE182Querying with BlastFA05 CSE182Blast Output•The output (Blastp query) is a series of protein sequences, ranked according to similarity with the query•Each database hit is aligned to a subsequence of the queryFA05 CSE182Blast Outputquery2619 405422SchematicdbQ begS begQ endS endS IdFA05 CSE182Blast OutputQ begS begQ endS endS IdFA05 CSE182The technological question •How do we measure similarity between sequences?•Percent identity?–Hard to compute without indels.Number of sequence edit operations?Number of sequence edit operations?Implies a notion of alignment.Implies a notion of alignment.A T C A A C GT C A A T G G TA T C A A - C G -- T C A A T G G TFA05 CSE182The biology question•How do we interpret these results?–Similar sequence in the 3 species implies that the common ancestor of the 3 had that sequence.–The sequence accumulates mutations over time. These mutations may be indels, or substitutions.–Hum and mus diverged more recently and so the sequences are more likely to be similar.hummusdroshummus?FA05 CSE182Computing alignments•What is an alignment?•2Xm table. •Each sequence is a row, with interspersed gaps•Columns describe the edit operations•What is the score of an alignment?What is the score of an alignment?•Score every column, and sum up the scores. Let C be the score Score every column, and sum up the scores. Let C be the score function for the columnfunction for the column•How do we compute the alignment with the best score?How do we compute the alignment with the best score?A A - T C G G AA C T C G - AFA05 CSE182Optimum scoring alignments, and score of optimum alignment•Instead of computing an optimum scoring alignment, we attempt to compute the score of an optimal alignment.•Later, we will show that the two are equivalentFA05 CSE182Computing Optimal Alignment score•Observations: The optimum alignment has nice recursive properties:–The alignment score is the sum of the scores of columns. –If we break off at cell k, the left part and right part must be optimal sub-alignments.–The left part contains prefixes s[1..i], and t[1..j] for some i and some j (we don’t know the values of i and j). 121ktsFA05 CSE182Optimum prefix alignments•Consider an optimum alignment of the prefix s[1..i], and t[1..j]•Look at the last cell, indexed by k. It can only have 3 possibilities.1kstFA05 CSE1823 possibilities for rightmost cell1. s[i] is aligned to t[j]2. s[i] is aligned to ‘-’3. t[j] is aligned to ‘-’s[i]t[j]s[i]t[j]Optimum alignment of s[1..i-1], and t[1..j-1]Optimum alignment of s[1..i-1], and t[1..j]Optimum alignment of s[1..i], and t[1..j-1]FA05 CSE182Optimal score of an alignment•Let S[i,j] be the score of an optimal alignment of the prefix s[1..i], and t[1..j]. It must be one of 3 possibilities.s[i]t[j]Optimum alignment of s[1..i-1], and t[1..j-1]s[i]-Optimum alignment of s[1..i-1], and t[1..j]-Optimum alignment of s[1..i], and t[1..j-1]t[j]S[i,j] = C(si,tj)+S(i-1,j-1)S[i,j] = C(si,-)+S(i-1,j)S[i,j] = C(-,tj)+S(i,j-1)FA05 CSE182Optimal alignment score•Which prefix pairs (i,j) should we use? For now, simply use all.•If the strings are of length m, and n, respectively, what is the score of the optimal alignment?€ S[i, j] = maxS[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) ⎧ ⎨ ⎪ ⎩ ⎪FA05 CSE182An O(nm) algorithm for score computation•The iteration ensures that all values on the right are computed in earlier steps.€ S[i, j] = maxS[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) ⎧ ⎨ ⎪ ⎩ ⎪For i = 1 to n For j = 1 to mFA05 CSE182Initialization€ S[0,0] = 0S[i,0] = C(si,−) + S[i −1,0] ∀iS[0, j] = C(−,sj) + S[0, j −1] ∀jFA05 CSE182A tableaux approachsn1i1 j n € MS[i,j-1] S[i,j]S[i-1,j]S[i-1,j-1]tCell (i,j) contains the score S[i,j]. Each cell only looks at 3 neighboring cells€ S[i, j] = maxS[i −1, j −1]+ C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) ⎧ ⎨ ⎪ ⎩ ⎪FA05 CSE182An Example•Align s=TCAT with t=TGCAA•Match Score = 1•Mismatch score = -1, Indel Score = -1•Score A1?, Score A2? T C A T -T G C A AT C A TT G C A AA1A2FA05 CSE182 0 -1 -2 -3 -4 -5 -1 1 0 -1 -2 -3 -2 0 0 1 0 -1 -3 -1 -1 0 2 1 -4 -2 -2 -1 1 1T G C A ATCATAlignment TableFA05 CSE182 1 1 -1 -2 -2 -4 1 2 0 -1 -1 -3 -1 0 1 0


View Full Document

UCSD CSE 182 - Blast & variants I

Download Blast & variants I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Blast & variants I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Blast & variants I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?