Unformatted text preview:

Longest Common SubsequenceLCS TheoremSlide 3Slide 4Direct Computation of LCS by Dynamic ProgrammingExample 3: Edit Distance = 6 + 8 – 2*5 = 4A Faster Algorithm for LCSSlide 8Slide 9Determine LIS and SC simultaneously in O(nlogn)ProofConstruction of a coverExampleAn Efficient Algorithm for Constructing the CoverO(n logn) AlgorithmSlide 16Example: π=(5,3,4,9,6,2,1,8,7,10)Reduction of LIS problem to LCS problemSlide 19Slide 20Slide 21Longest Common SubsequenceDefinition: The longest common subsequence or LCS of two strings S1 and S2 is the longest subsequence common between two strings. S1 : A -- A T -- G G C C -- A T A n=10S2: A T A T A A T T C T A T -- m=12The LCS is AATCAT. The length of the LCS is 6. The solution is not unique for all pair of strings. Consider the pair (ATTA, ATAT). The solutions are ATT, ATA. In general, for arbitrary pair of strings, there may exist many solutions.LCS TheoremThe LCS can be found by dynamic programming formulation. One can easily show:Theorem: With a score of 1 for each match and a zero for each mismatch or space , the matched characters in an alignment of maximum value for a LCS.Since it is using the general dynamic programming algorithm its complexity is O(nm) .A longest substring problem, on the other hand has a O(n+m) solution. Subsequences are much more complex than substrings. Can we do better for the LCS problem? We will see …The optimal alignment is shown above. Note the alignment shows three insert (dark), one delete green) and three substitution or replacement operations (blue), which gives an edit distance of 7. But, the 3 replacement operations can be realized by 3 insert and 3 delete operations because a replacement is equivalent to first delete the character and then insert a character in its place like:S1 : A -- A T -- G G C C -- A T A n=10S2: A T A T A A T T C T A T -- m=12G -- G -- C ---- A -- T -- Tif we give a cost of 2 for replace operation and cost of 1 for both insert and delete operations, the minimum edit distance D can be computed in terms of the length L of LCS as:For the above example, n=10, m=12, L=6. So, D=10 ( 6 insert and 4 delete).LnmD 2Direct Computation of LCS by Dynamic ProgrammingMore efficient although the asymptotic complexity remains the same, O(nm). Let L denote The equations are given below without proof (which is simple).Again, if we leave suitable back pointers in the matrix, trace(s) can be derived for the LCS.)()(..)]........,1(),1,(max[),()()(................).........1,1(1),(0),0(0)0,(0)0,0(2121jSiSjiLjiLjiLjSiSjiLjiLjLiLLExample 3: Edit Distance = 6 + 8 – 2*5 = 4 S1= A T C A TS2= T A A T C A T A↓ ↓ ↑A↓ji0 T1A2A3T4C5A6T7A80 0 0 0 0 0 0 0 0 0A 1 0 0 1 1 1 1 1 1 1T 2 0 1 1 1 2 2 2 2 2C 3 0 1 1 1 2 3 3 3 3A 4 0 1 2 2 2 3 4 4 4A 5 0 1 2 3 3 3 4 4 5T 6 0 1 2 3 4 4 4 5 5A Faster Algorithm for LCSAn algorithm that is asymptotically better than O(nm) for determining LCS. Implies that for special cases of edit distance, there exist more efficient algorithm.Definition: Let π be a set of n integers, not necessarily distinct.Definition: An increasing subsequence(IS) of π is a subsequence of π whose values are strictly increasing from left to right.Example: π=(5,3,4,4, 9,6,2,1,8,7,10). IS=(3,4,6,8,10), (5,9,10)Definition: A longest increasing subsequence(LIS) of π is an IS π of maximum length.Definition: A decreasing subsequence (DS) of π is a non-increasing subsequence f π.Example: DS=(5,4,4,2,1).Definition: A cover is a set of disjoint DS of π that covers or contains all elements of π. The size of the cover c equals the number of DS in the cover.Example: π=(5,3,4,9,6,2,1,8,7) Cover:{ (5,3,2,1),(4),(9,6),(8,7)}. C=#of DS=4.Definition: A smallest cover (SC) is a cover with a minimum value of c.Determine LIS and SC simultaneously in O(nlogn)Lemma: If I is an IS of π with length equal to the size of a cover C of π, then I is a LIS of π and C is the smallest cover of size c.ProofIf I is an increasing sequence, it cannot contain more than one element from a decreasing sequence. This means that no increasing subsequence can have size more than the size of any cover C, that is, if a maximum of one element from each can participate in any increasing sequence. Thus, an IS derived from this decomposition can have a maximum length of |C |=c. Conversely, C must be the smallest. If not, let c’ be the length of a cover C’ such that |C’|=< c i.e., if we derive IS from C, it must contain more than one element from one of the decreasing sequence of C’, which is not possible. Hence C has to be of smallest size.kCCCC  .........21Construction of a coverGreedy algorithm to derive a cover:Starting from the left of π, examine each successive number in π. Append the current number at the left-most subsequence derived so far if it is possible do that maintaining the decreasing sequence property. If not start a new decreasing subsequence beginning with the current element. Proceed until π is exhausted.Exampleπ=(5,3,4,9,6,2,1,8,7,10)D1=(5,3,2,1), D2=(4), D3=(9,6), D4=(8,7), D4=(10)The algorithm has O (n2) complexity. We will present an O (n logn) algorithm.An Efficient Algorithm for Constructing the CoverWe use a data structure which is a list containing the last number of each of the decreasing sequence that is being constructed. The list is always sorted in increasing order. An identifier indicating which list the number belongs to also included.Procedure Decreasing Sequence CoverInput: π= , the list of input numbers.Output: the set of decreasing sequences Di constituting the cover.).........,(,21 nxxxO(n logn) AlgorithmInitialize: i←1; Di=(x1); L=(x1, i) ; j←1;For i=2 to n doSearch the x-fields of L to find the first x-value such that xi < x. ….takes O( logn) time.If such a value exists, then insert x at the end in the list Di and set xi←x in L… This step takes constant time.If such a value does not exist in L, then set j←j+1. insert in L a new element (x,j) and start a new decreasing sequence Dj=(x)EndLemma: At any point in the execution of the


View Full Document

UCF CAP 5937 - Longest common subsequence

Documents in this Course
Load more
Download Longest common subsequence
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Longest common subsequence and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Longest common subsequence 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?