DOC PREVIEW
UMD CMSC 423 - Alignment heuristics

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC423 Fall 2009 1CMSC423: Bioinformatic Algorithms, Databases and ToolsAlignment heuristicsCMSC423 Fall 2009 2Heuristics•What if limit the # of differences allowed? E.g. we expect the sequences to be very similar.• Compute 'banded' alignment – stay within # of differences (k) from the diagonal.• Optimal alignment cannot stray too far from diagonal•What if we do not know k? Do binary search to find itkkO(km) runningtime and spaceCMSC423 Fall 2009 3Exclusion methods•Assume P must match T with at most k errors. Find places in T where P cannot match.•Split P into floor(n/k+1)-sized chunks.• If P matches T with less than k errors => at least one chunk matches with no errors•Use any exact matching algorithm to find places where a chunk matches T, then run dynamic programming in that vicinity. • Running time, on average O(m)CMSC423 Fall 2009 4Exclusion methodsExact matchPutative alignmentTextPatternCMSC423 Fall 2009 5"Famous" approaches•FASTA (Pearson et al.)–Take all k-mers (substrings of length k) from Pattern and identify whether and where they match in the Text–Assume the k-mer starting at pos'n i in Pattern matches at position j in Text, remember (j – i) – the diagonal on which the match occured–Identify "heavy" diagonals – diagonals where many k-mers match, then refine the diagonals with Smith Waterman–Also look for off-diagonal matches to account for gapsCMSC423 Fall 2009 6"Famous" approaches•BLAST (Altschul et al.)–Find short k-mer matches–Also search for possible inexact matches, e.g. all k-mers within 1 difference from current one.–Extend exact matches with Smith-Waterman algorithm–Assign probabilistic scores to matches: what is the probability of finding a match with the same S-W alignment score just by chance (e.g. matching a random string)?Aside: viewing alignments with dot-plotsaxes – two sequences/genomes, 'dots' – regions that match in the two genomesCMSC423 Fall 2009 8Chaining approach•Extends the FASTA idea• Search for exact matches• Find the longest consistent chain of exact matches•Fill in the gaps in the chain using Smith-Waterman• This is the approach used by MUMmer (Delcher et al.)•MUM – maximally unique match (see mummer.sourceforge.net)CMSC423 Fall 2009 9Chaining in 1-D •Input: multiple overlapping intervals on a line•Output: highest weight set of non-overlapping intervals•Weight could be length of interval, or Smith-Waterman score, etc.CMSC423 Fall 2009 10Chaining in 1D•Basic idea – dynamic programming• V[j] – weight of best chain ending with interval j•V[j] = maxk < j, intervals k & j do not overlap(V[k] + weight(j))•i.e. find all possible ways of building a chain ending at j and pick the best one (the key to all dynamic programming algorithms)• Where do we find the answer?• How do we find the actual chain?•What is the running time?largest value in V arraybacktrackingO(n^2)CMSC423 Fall 2009 11Chaining in 1D•Sort the endpoints (starts, ends) of the intervals•For every interval j, store V[j] – best score of a chain ending in j•MAX – store highest V[j] seen sofar•Process endpoints in increasing order of x coordinate•If we encounter left end (start) of interval j–V[j] = weight(j) + MAX•If we encounter right end (end) of interval j–MAX = max{V[j], MAX}• Running time? O(n log n) – from sortingCMSC423 Fall 2009 12Chaining in 1-D •Input: multiple overlapping intervals on a line•Output: highest weight set of non-overlapping intervals•Weight could be length of interval, or Smith-Waterman score, etc.•Rationale? The pattern can have multiple inconsistent exact matches to the text – we want to pick a longest consistent setTPCMSC423 Fall 2009 13Path “planning” and dynamic programming•One intuitive way to think about dynamic programming–similar to finding shortest path between two points–at each “point” ask – what are all possible ways to get here?–pick the best (shortest, fastest, etc.)DCDCFrederickBaltimoreHarrisburgPhillyNYCCMSC423 Fall 2009 14Chaining in 2-D•Easy to do in O(n2) (n - # of intervals)•View alignments as "boxes"•All boxes in a chain must follow each other in a "diagonal" order, i.e. the range of the x coordinates and y coordinates of any two boxes in a chain cannot overlap•Similar to 1-D approach except at each step we must check if current box can extend any of the previously built chains•V[j] = maxall previous boxes k {V[k] + weight(j)}•More complex algorithm leads to O(n log n) running timeCMSC423 Fall 2009 15Suffix trees + dynamic programming•Idea: find inexact seeds (rather than exact matches)• Observation: if two sequences match within x% identity there must be some short subsequence that also matches with at least x% identity•Why is this useful? You can backtrack quickly if error rate exceeded (short sequences will have to be almost perfect).f(i)lkeCMSC423 Fall 2009 16suffix trees + dynamic programming? ? ? ? ? ?A 3 2 1 1 ?N 2 1 0 1 ?A 1 0 1 2 ?- 0 1 2 3 ?- A N N


View Full Document

UMD CMSC 423 - Alignment heuristics

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Download Alignment heuristics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Alignment heuristics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Alignment heuristics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?