DOC PREVIEW
UMD CMSC 351 - Lecture 25: Longest Common Subsequence

This preview shows page 1 out of 3 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture Notes CMSC 251 Lecture 25 Longest Common Subsequence April 28 1998 Read Section 16 3 in CLR Strings One important area of algorithm design is the study of algorithms for character strings There are a number of important problems here Among the most important has to do with efficiently searching for a substring or generally a pattern in large piece of text This is what text editors and functions like grep do when you perform a search In many instances you do not want to find a piece of text exactly but rather something that is similar This arises for example in genetics research Genetic codes are stored as long DNA molecules The DNA strands can be broken down into a long sequences each of which is one of four basic types C G T A But exact matches rarely occur in biology because of small changes in DNA replication Exact substring search will only find exact matches For this reason it is of interest to compute similarities between strings that do not match exactly The method of string similarities should be insensitive to random insertions and deletions of characters from some originating string There are a number of measures of similarity in strings The first is the edit distance that is the minimum number of single character insertions deletions or transpositions necessary to convert one string into another The other which we will study today is that of determining the length of the longest common subsequence Longest Common Subsequence Let us think of character strings as sequences of characters Given two sequences X hx1 x2 xm i and Z hz1 z2 zk i we say that Z is a subsequence of X if there is a strictly increasing sequence of k indices hi1 i2 ik i 1 i1 i2 ik n such that Z hXi1 Xi2 Xik i For example let X hABRACADABRAi and let Z hAADAAi then Z is a subsequence of X Given two strings X and Y the longest common subsequence of X and Y is a longest sequence Z which is both a subsequence of X and Y For example let X be as before and let Y hYABBADABBADOOi Then the longest common subsequence is Z hABADABAi The Longest Common Subsequence Problem LCS is the following Given two sequences X hx1 xm i and Y hy1 yn i determine a longest common subsequence Note that it is not always unique For example the LCS of hABCi and hBACi is either hACi or hBCi Dynamic Programming Solution The simple brute force solution to the problem would be to try all possible subsequences from one string and search for matches in the other string but this is hopelessly inefficient since there are an exponential number of possible subsequences Instead we will derive a dynamic programming solution In typical DP fashion we need to break the problem into smaller pieces There are many ways to do this for strings but it turns out for this problem that considering all pairs of prefixes will suffice for us A prefix of a sequence is just an initial string of values Xi hx1 x2 xi i X0 is the empty sequence The idea will be to compute the longest common subsequence for every possible pair of prefixes Let c i j denote the length of the longest common subsequence of Xi and Yj Eventually we are interested in c m n since this will be the LCS of the two entire strings The idea is to compute c i j assuming that we already know the values of c i0 j 0 for i0 i and j 0 j but not both equal We begin with some observations Basis c i 0 c j 0 0 If either sequence is empty then the longest common subsequence is empty 76 Lecture Notes CMSC 251 Last characters match Suppose xi yj Example Let Xi hABCAi and let Yj hDACAi Since both end in A we claim that the LCS must also end in A We will explain why later Since the A is part of the LCS we may find the overall LCS by removing A from both sequences and taking the LCS of Xi 1 hABCi and Yj 1 hDACi which is hACi and then adding A to the end giving hACAi as the answer At first you might object But how did you know that these two A s matched with each other The answer is that we don t but it will not make the LCS any smaller if we do Thus if xi yj then c i j c i 1 j 1 1 Last characters do not match Suppose that xi 6 yj In this case xi and yj cannot both be in the LCS since they would have to be the last character of the LCS Thus either xi is not part of the LCS or yj is not part of the LCS and possibly both are not part of the LCS In the first case the LCS of Xi and Yj is the LCS of Xi 1 and Yj which is c i 1 j In the second case the LCS is the LCS of Xi and Yj 1 which is c i j 1 We do not know which is the case so we try both and take the one that gives us the longer LCS Thus if xi 6 yj then c i j max c i 1 j c i j 1 We left undone the business of showing that if both strings end in the same character then the LCS must also end in this same character To see this suppose by contradiction that both characters end in A and further suppose that the LCS ended in a different character B Because A is the last character of both strings it follows that this particular instance of the character A cannot be used anywhere else in the LCS Thus we can add it to the end of the LCS creating a longer common subsequence But this would contradict the definition of the LCS as being longest Combining these observations we have the following rule 0 c i 1 j 1 1 c i j max c i j 1 c i 1 j if i 0 or j 0 if i j 0 and xi yj if i j 0 and xi 6 yj Implementing the Rule The task now is to simply implement this rule As with other DP solutions we concentrate on computing the maximum length We will store some helpful pointers in a parallel array b 0 m 0 n Longest Common Subsequence LCS char x 1 m char y 1 n int c 0 m 0 n for i 0 to m do c i 0 0 b i 0 SKIPX for j 0 to n do c 0 j 0 b 0 j SKIPY for i 1 to m do for j 1 to n do if x i y j c i j c i 1 j 1 1 b i j ADDXY else if c i 1 j c i j 1 c i …


View Full Document

UMD CMSC 351 - Lecture 25: Longest Common Subsequence

Download Lecture 25: Longest Common Subsequence
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 25: Longest Common Subsequence and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 25: Longest Common Subsequence and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?