DOC PREVIEW
UMD CMSC 423 - Lecture 8 Sequence alignment

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC423 Fall 2008 1CMSC423: Bioinformatic Algorithms, Databases and ToolsLecture 8Sequence alignment:exact alignment inexact alignmentdynamic programming, gapped alignmentCMSC423 Fall 2008 2Suffix trees for matching• Suffix trees use O(n) space• Suffix trees can be constructed in O(n) time•Is CAT part of ATCATG ?•Match from root, char by char• If run out of query – found match•otherwise, there is no match•intuition: CAT is the prefixof some suffixAT1,2G$6,7T2,2CATG$3,7G$6,7CATG$3,7G$6,7CATG$3,74 165 23$7,77CMSC423 Fall 2008 3Suffix links – useful for substring matches• Does any part of AGATG match string AGCAGT?AG1,2T$6,7G2,2CAGT$3,7T$6,7CAGT$3,7T$6,7CAGT$3,74 165 23$7,77CMSC423 Fall 2008 4Other uses• Finding repeats– internal nodes with multiple children – DNA that occurs in multiple places in the genome• Longest common substring of two strings–build suffix tree of both strings. Find lowest internal node that has leaves from both strings–or: build suffix tree on one string and use suffix links to find longest match•Note: running time for matching is O(|Pattern|), not O(|Pattern| + |Text|) (though O(|Text|) was spent in pre-processingCMSC423 Fall 2008 5Why do we care?• Suffix trees are used for– mapping reads to a genome (e.g. personal genomics)–comparing genomes (comparative genomics)– finding repeats– identifying genome signatures •Exact matching – what to expect on exams– build a suffix tree for a string–answer some questions about one of the algorithms, e.g. for Z algorithm – is it necessary j be the farthest reaching Z-value or just any Z value extending past i?– do something with the help of some of the algorithms (e.g. look for repeats that occur exactly twice, etc.)CMSC423 Fall 2008 6Suffix arrays• Suffix trees are expensive > 20 bytes / base• Suffix arrays: lexicographically sort all suffixes• Can quickly find the correct suffix through binary search•Note: much less space, but longer running time (incur a log(n) term) ATG 4ATCATG 1 CATG 3 G 6 TCATG 2 TG 5CMSC423 Fall 2008 7Suffix arrays and compression• Burrows-Wheeler transformBANANA BANANA$ ANANA$B NANA$BA ANA$BAN NA$BANA A$BANAN$BANANA$BANANA A$BANANANA$BANANANA$BBANANA$NA$BANANANA$BAsortANNB$AAcompresscharacter before the suffixBWTNote: characters in last column occur in same order as in first columnUseful for matching within BWTCMSC423 Fall 2008 8BWT – string matching• Look for “BANA”•Start at end (match right to left)• Find character in rightmost column• Identify corresponding range in first column•Switch back to last column• ...• How do we know the firstA in the pattern is the 2nd/3rdfrom the top of the matrix?•Note: add'l data needed: # of times each letter appearsbefore every pos'n• Running time?O(len(P)) operations. Each may cost O(log(len(T)))ABN$0000100010101020112011212121$BANANA A$BANANANA$BANANANA$BBANANA$NA$BANANANA$BAANAABCMSC423 Fall 2008 9Exact alignment recap• Exact matching can be done efficiently:O(|Text| + |Pattern|)• Key idea: preprocess data to keep track of similar regions, then use information to "jump" over places where no match can


View Full Document

UMD CMSC 423 - Lecture 8 Sequence alignment

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Download Lecture 8 Sequence alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 8 Sequence alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 8 Sequence alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?