New version page

UMD CMSC 423 - Inexact alignment dynamic programming, gapped alignment

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Upgrade to remove ads

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

CMSC423 Fall 2008 1CMSC423: Bioinformatic Algorithms, Databases and ToolsLecture 9inexact alignmentdynamic programming, gapped alignmentCMSC423 Fall 2008 2Recap3Global alignment recapC--AGACTGA G GATGCAGCGTAGGTCAGACValue(A,A) = 10Value(A,G) = -5Value(A,-) = -2Score[i,j] is the maximum of:1. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned)2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap)3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap)4Global alignment recap1920910140-14-28C-28-24-20-16-12-8-40--24-20-16-12-8-4-2024131434-10AGACTG2410141848-61014378-6-2-13-9-5A-226G-134812048-31-14-10-6-22GATGCAG-C-GTAG-GTCAG-ACValue(A,A) = 10Value(A,G) = -5Value(A,-) = -4Score[i,j] is the maximum of:1. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned)2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap)3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap)5Local alignment recapC--AGACTGA G GATGCAGCGTAGGTCAGACValue(A,A) = 10Value(A,G) = -5Value(A,-) = -2Score[i,j] is the maximum of:0. 01. Score[i-1, j-1] + Value[S1[i-1],S2[j-1]] (S1[i-1], S2[j-1] aligned)2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap)3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap)CMSC423 Fall 2008 6Alignment scoresCMSC423 Fall 2008 7Where do the alignment scores come from?• PAM matrices– PAM1 – based on frequency of mutations between closely related proteins (within 1 "evolutionary step")– PAM 2 - ... within 2 evolutionary steps–... PAM 250 – commonly used•BLOSUM matrices–Frequency of mutations between proteins that are x% similar– BLOSUM100 – based on proteins that are exactly the same (e.g. score(A,A) is defined but not score(A,G) )– BLOSUM62 – commonly used• gap scores usually determined empiricallyCMSC423 Fall 2008 8BLOSUM62CMSC423 Fall 2008 9HeuristicsCMSC423 Fall 2008 10Heuristics• What if limit the # of differences allowed? E.g. we expect the sequences to be very similar.• Compute 'banded' alignment – stay within # of differences (k) from the diagonal.•Optimal alignment cannot stray too far from diagonal•What if we do not know k? Do binary search to find itkkO(km) runningtime and spaceCMSC423 Fall 2008 11Exclusion methods• Assume P must match T with at most k errors. Find places in T where P cannot match.•Split P into floor(n/k+1)-sized chunks.• If P matches T with less than k errors => at least one chunk matches with no errors•Use any exact matching algorithm to find places where a chunk matches T, then run dynamic programming in that vicinity. • Running time, on average O(m)CMSC423 Fall 2008 12Exclusion methodsExact matchPutative alignmentTextPatternCMSC423 Fall 2008 13"Famous" approaches• FASTA (Pearson et al.)– Take all k-mers (substrings of length k) from Pattern and identify whether and where they match in the Text– Assume the k-mer starting at pos'n i in Pattern matches at position j in Text, remember (j – i) – the diagonal on which the match occured–Identify "heavy" diagonals – diagonals where many k-mers match, then refine the diagonals with Smith Waterman–Also look for off-diagonal matches to account for gapsCMSC423 Fall 2008 14"Famous" approaches• BLAST (Altschul et al.)– Find short k-mer matches–Also search for possible inexact matches, e.g. all k-mers within 1 difference from current one.–Extend exact matches with Smith-Waterman algorithm– Assign probabilistic scores to matches: what is the probability of finding a match with the same S-W alignment score just by chance (e.g. matching a random string)?CMSC423 Fall 2008 15Chaining approach• Extends the FASTA idea• Search for exact matches•Find the longest consistent chain of exact matches•Fill in the gaps in the chain using Smith-Waterman• This is the approach used by MUMmer (Delcher et al.)•MUM – maximally unique match (see mummer.sourceforge.net)CMSC423 Fall 2008 16Chaining in 1-D • Input: multiple overlapping intervals on a line•Output: highest weight set of non-overlapping intervals• Weight could be length of interval, or Smith-Waterman score, etc.•Sort the endpoints (starts, ends) of the intervals• For every interval j, store V[j] – best score of a chain ending in j• MAX – store highest V[j] seen sofar•Process endpoints in increasing order of x coordinate• If we encounter left end (start) of interval j–V[j] = weight(j) + MAX• If we encounter right end (end) of interval j– MAX = max{V[j], MAX}• Running


View Full Document
Download Inexact alignment dynamic programming, gapped alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Inexact alignment dynamic programming, gapped alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Inexact alignment dynamic programming, gapped alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?