DOC PREVIEW
IUPUI CS 580 - String Matching

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

String Matching (Chap. 32)An example of string matchingNotation and terminologyGraphical Proof of Lemma 32.1Naïve string matchingProblem with naïve algorithmKnuth-Morris-Pratt (KMP) algorithmPrefix functionSlide 9Slide 10Slide 11Analysis of KMP algorithmString Matching (Chap. 32)•Given a pattern P[1..m] and a text T[1..n], find all occurrences of P in T. Both P and T belong to *.•P occurs with shift s (beginning at s+1): P[1]=T[s+1], P[2]=T[s+2],…,P[m]=T[s+m].•If so, call s is a valid shift, otherwise, an invalid shift.• Note: one occurrence begins within another one: P=abab, T=abcabababbc, P occurs at s=3 and s=5.An example of string matchingNotation and terminology•w is a prefix of x, if x=wy for some y*. Denoted as wx.•w is a suffix of x, if x=yw for some y*. Denoted as wx.•Lemma 32.1 (Overlapping shift lemma):–Suppose x,y,z and xz and yz, then if |x||y|, then xy; if |x|  |y|, then yx; if |x| = |y|, then x=y.Graphical Proof of Lemma 32.1Naïve string matchingRunning time: O((n-m+1)m).Problem with naïve algorithm•Problem with Naïve algorithm:–Suppose p=ababc, T=cabababcd.•T: c a b a b a b c d•P: a …•P: a b a b c•P: a…•P: a b a b c•Whenever a character mismatch occurs after matching of several characters, the comparison begins by going back in T from the character which follows the last beginning character.•Can we do better: not go back in T?Knuth-Morris-Pratt (KMP) algorithm•Idea: after some character (such as q) matches of P with T and then a mismatch, the matched q characters allows us to determine immediately that certain shifts are invalid. So directly go to the shift which is potentially valid.•The matched characters in T are in fact a prefix of P, so just from P, it is OK to determine whether a shift is invalid or not. •Define a prefix function , which encapsulates the knowledge about how the pattern P matches against shifts of itself. :{1,2,…,m}{0,1,…,m-1}[q]=max{k: k<q and Pk  Pq}, that is [q] is the length of the longest prefix of P that is a proper suffix of Pq.Prefix functionIf we precompute prefix function of P (against itself), then whenevera mismatch occurs, the prefix functioncan determine which shift(s) are invalidand directly ruled out. So move directlyto the shift which is potentially valid.However, there is no need to comparethese characters again since they areequal.Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Analysis of KMP algorithm•The running time of COMPUTE-PREFIX-FUNCTION is (m) and KMP-MATCHER (m)+ (n).•Using amortized analysis (potential method) (for COMPUTE-PREFIX-FUNCTION):–Associate a potential of k with the current state k of the algorithm:–Consider codes in Line 5 to 9. –Initial potential is 0, line 6 decreases k since [k]<k, k never becomes negative.–Line 8 increases k at most 1.–Amortized cost = actual-cost + potential-increase–=(repeat-times-of-Line-5+O(1))+(potential-decrease-at-least the repeat-times-of-Line-5+O(1) in line


View Full Document

IUPUI CS 580 - String Matching

Download String Matching
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view String Matching and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view String Matching 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?