DOC PREVIEW
UCSD CSE 182 - Lecture

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

L4 Linear space Scoring matrices October 09! CSE 182!October 09! CSE 182!Alignment (Linear Space) • Score computation € i2= i%2; i1= (i −1)%2;S[i2, j] = maxS[i1, j −1] + C(si,tj)S[i1, j] + C(si,−)S[i2, j −1] + C(−,tj)            For i = 1 to n! For j = 1 to m !€ S[i, j] = maxS[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj)    October 09! CSE 182!Linear Space Alignment • In Linear Space, we can do each row of the D.P. • We need to compute the optimum path from the origin (0,0) to (m,n)October 09! CSE 182!Linear Space (cont’d) • At i=n/2, we know scores of all the optimal paths ending at that row. • Define F[j] = S[n/2,j] • One of these j is on the true path. Which one?October 09! CSE 182!Backward alignment • Let Sb[i,j] be the optimal score of aligning s[i+1..n] with t[j+1..m] € Sb[i, j] = maxSb[i + 1, j + 1] + C(si,tj)Sb[i + 1, j] + C(si,−)Sb[i, j + 1] + C(−,tj)     • Boundary cases? • Sb[n,j]? Sb[m,j]?October 09! CSE 182!Backward alignment • Let Sb[i,j] be the optimal score of aligning s[i+1..n] with t[j+1..m] • Define B[j] = Sb[n/2,j] • One of these j is on the true path. Which one?October 09! CSE 182!Forward, Backward computation • At the optimal coordinate, j – F[j]+B[j]=S[n,m] • In O(nm) time, and O(m) space, we can compute one of the coordinates on the optimum path.October 09! CSE 182!Linear Space Alignment • Align(1..n,1..m) – For all 1<=j <= m • Compute F[j]=S(n/2,j) – For all 1<=j <= m • Compute B[j]=Sb(n/2,j) – j* = maxj {F[j]+B[j] } – X = Align(1..n/2,1..j*) – Y = Align(n/2+1..n,j*+1..m) – Return X,j*,YOctober 09! CSE 182!Linear Space complexity • T(nm) = c.nm + T(nm/2) = O(nm) • Space = O(m)October 09! CSE 182!Silly Quiz • Name a famous Bioinformatics Researcher • Name a famous Bioinformatics Researcher who is a womanScoring Matrices • We have seen that affine gap penalties help concentrate the gaps in small regions. • What about substitution errors. Are all substitutions alike? October 09! CSE 182!October 09! CSE 182!Scoring DNA • DNA has structure.October 09! CSE 182!DNA scoring matrices • So far, we considered a simple match/mismatch criterion. • The nucleotides can be grouped into Purines (A,G) and Pyrimidines. • Nucleotide substitutions within a group (transitions) are more likely than those across a group (transversions)The scoring problem • When comparing hum and dros., more mismatches are likely just by chance.. If we expect to see 70% of the residues to be mutated, then a 50% identity is great. • A different scoring function is needed for hum and drosphila October 09! CSE 182!hum!mus!dros!October 09! CSE 182!Scoring proteins • Scoring protein sequence alignments is a much more complex task than scoring DNA – Not all substitutions are equal • Problem was first worked on by Pauling and collaborators • In the 1970s, Margaret Dayhoff created the first similarity matrices. – “One size does not fit all” – Homologous proteins which are evolutionarily close should be scored differently than proteins that are evolutionarily distant – Different proteins might evolve at different rates and we need to normalize for thatFrequency based scoring • Our goal is to score each column in the alignment • Comparing against expectation: – Think about alignments of pairs of random sequences, and compute the probability that A and B appear together just by chance PR(A,B) – Compute the probability of A and B appearing together in the alignment of related sequences (orthologs) PO(A,B) • A good score function? October 09! CSE 182! A B € logPO(A,B)PR(A,B)     October 09! CSE 182!PAM 1 distance • Two sequences are 1 PAM apart if they differ in 1 % of the residues. • PAM1(a,b) = Pr[residue b substitutes residue a, when the sequences are 1 PAM apart] 1% mismatch!October 09! CSE 182!PAM1 matrix • Align many proteins that are very similar – Is this a problem? • 1 PAM evolutionary distance represents the time in which 1% of the residues have changed • Estimate the frequency Pb|a of residue a being substituted by residue b. • PAM1(a,b) = Pa|b = Pr(b will mutate to an a after 1 PAM evolutionary distance) • Scoring matrix – S(a,b) = log10(Pab/PaPb) = log10(Pb|a/Pb)October 09! CSE 182!PAM 1 • Top column shows original, and left column shows replacement residue = PAM1(a,b) = Pr(a|b)October 09! CSE 182!PAM distance • Two sequences are 1 PAM apart when they differ in 1% of the residues. • When are 2 sequences 2 PAMs apart? 1 PAM 1 PAM 2 PAMOctober 09! CSE 182!Generating Higher PAMs • PAM2(a,b) = ∑c PAM1(a,c). PAM1 (c,b) • PAM2 = PAM1 * PAM1 (Matrix multiplication) • PAM250 – = PAM1*PAM249 – = PAM1250 =!a!a!b!c!b!c!PAM2!PAM1!PAM1!October 09! CSE 182!Note: This is not the score matrix: !What happens as you keep increasing the power?!October 09! CSE 182!Scoring residues • A reasonable score function C(a,b) is given as follows: – Look at ‘high quality’ alignments – C(a,b) should be high when a,b are seen together more often than is expected by chance – C(a,b) should be low, otherwise. • How often would you expect to see a,b together just by chance? – Pa Pb • Let Pab be the probability that a and b are aligned in a high-quality alignment • A good scoring function is the log-odds score – C(a,b)= log10 (Pab/PaPb)October 09! CSE 182!Scoring alignments • To compute Pab, we need ‘high-quality’ alignments • How can you get quality alignments? – Use SW (But that needs the scoring function) – Build alignments manually – Use Dayhoff’s theory to extrapolate from high identity alignmentsOctober 09! CSE 182!Scoring using PAM matrices • Suppose we know that two sequences are 250 PAMs apart. • S(a,b) = log10(Pab/PaPb)= log10(Pa|b/Pa) = log10(PAM250(a,b)/Pa) • How does it help? – S250(A,V) >> S1(A,V) – Scoring of hum vs. Dros should be using a higher PAM matrix than scoring hum vs. mus. – An alignment with a smaller % identity could still have a higher score and be more significant hum!mus!dros!October 09! CSE


View Full Document

UCSD CSE 182 - Lecture

Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?