String SearchingString SearchHistory of String SearchPowerPoint PresentationImproving String SearchImproved string search, continuedProblem Definition, terminologyStraightforward string searchingSimpleStringSearchSlide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Knuth-Morris-PrattSlide 19Knuth-Morris PrattSlide 21Slide 22Slide 23Slide 24Slide 25Slide 26The Boyer-Moore AlgorithmBoyer-Moore exampleSlide 29Slide 30Boyer-Moore : another exampleSlide 32Slide 33BMPSkipBoyer-MooreSlide 36Note:BMSearchThe Karp-Rabin Algorithm IdeaThe Karp-Rabin AlgorithmString SearchingCSCI 2720Spring 2007Eileen KraemerA common word processor facility is to search for a given word in a document. Generally, the problem is to search for occurrences of a short string in a long string. String SearchDo the first then do the other onetheHistory of String SearchThe brute force algorithm:invented in the dawn of computer historyre-invented many times, still commonKnuth & Pratt invented a better one in 1970invented independently by Morrispublished 1976 as “Knuth-Morris-Pratt”Boyer & Moore found a better one before 1976found independently by GosperKarp & Rabin found a “better” one in 1980The obvious algorithm is to try the word at each possible place, and compare all the characters:for i := 0 to n-m do (doc length n)for j := 0 to m-1 do(word length m)compare word[j] with doc[i+j]if not equal, exit the inner loopThe complexity is at worst O(m*n) and best O(n).Improving String SearchSurprisingly, there is a faster algorithm where you compare the last characters first:Do the first then do the other onethecompare ‘e’ with ‘ ‘, fail so move along 3 placesDo the first then do the other onethecan only move along 2 placesImproved string search, continuedIn every case where the document character is not one of the characters in the word, we can move along m places. Sometimes, it is less.Problem Definition, terminologyLet p be the pattern stringLet t be the target stringLet k be the index of the character in the target string that “lies over” the first character of the patternGiven two strings, p and t, over the alphabet , determine whether p occurs as the substring of tThat is, determine whether there exists k such that p=Substring(t,k,|p|).Straightforward string searchingfunction SimpleStringSearch(string p,t): i nteger{Find p in t; return its location or -1 if p is not a substring of t} for k from 0 to Length(t) – Length(p) doi <- 0 while i < Length(p) and p[i] = t[k+i] doi <- i+1if i == Length(p) then return k return -1SimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C Dp[0] p[1] p[2] p[3] Y Y Y NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] Y Y Y YStraightforward string searchingWorst case:Pattern string always matches completely except for last characterExample: search for XXXXXXY in target string of XXXXXXXXXXXXXXXXXXXXOuter loop executed once for every character in target stringInner loop executed once for every character in pattern(|p| * |t|)Okay if patterns are short, but better algorithms existKnuth-Morris-Pratt(|p| * |t|)Key idea: if pattern fails to match, slide pattern to right by as many boxes as possible without permitting a match to go unnoticedKnuth-Morris-Prattt[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]X Y X Y X Y c p[0] p[1] p[2] p[3] p[4] Y Y Y YX Y X Y Z NX Y X Y Z Y Y Y Y ?Knuth-Morris PrattCorrect motion of pattern depends on both location of mismatch and the mismatching characterIf c == X : move 2 boxes to rightIf c == E : move 5 boxes to rightIf c == Z : target found; alg terminatesKnuth-Morris-PrattGoal: determine d, number of boxes to right pattern should move; smallest d such that:p[0] = t[k+d]p[1] = t[k+d+1]p[2] = t[k+d+2]…p[i-d] = t[k+i]Knuth-Morris-PrattNote: can be stated largely in terms of pattern alone.Value of d depends only on:The patternThe value of iThe mismatching character c (at t[k+i])Knuth-Morris-PrattCan define a function KMPskip(p,i,c) to give correct d Return smallest integer d such that 0 <= d <=I, such that p[i-d] == c and p[j] == p[j+d] for each 0 <=j <= i-di1Return i+1 if no such d existsCalculate all values of KMPskip for pattern p and store it in KMPskiparraydo lookup at each mismatchKnuth-Morris-PrattFor pattern ABCD:0 1 2 31 0 3 41 2 0 41 2 3 01 2 3 4A B C D A B C D otherKnuth-Morris-PrattFor pattern XYXYZ:0 1 0 3 21 0 3 0 51 2 3 4 01 2 3 4 5 X Y X Y Z X Y Z otherKnuth-Morris-PrattFunction KMPSearch(string p, t): integer{Find p in t; return its location or -1 if p is not a substring of t}KMPskiparray <- ComputeKMPskiparray(p)k <- 0i <- 0While k < Length(t) – Length(p) doif i == Length(p) then return kd <- KMPskiparray[I,t[k+i]]k <- k + di <- I
View Full Document