DOC PREVIEW
UGA CSCI 2720 - stringsearch

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

String SearchingString SearchHistory of String SearchPowerPoint PresentationImproving String SearchImproved string search, continuedProblem Definition, terminologyStraightforward string searchingSimpleStringSearchSlide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Knuth-Morris-PrattSlide 19Knuth-Morris PrattSlide 21Slide 22Slide 23Slide 24Slide 25Slide 26The Boyer-Moore AlgorithmBoyer-Moore exampleSlide 29Slide 30Boyer-Moore : another exampleSlide 32Slide 33BMPSkipBoyer-MooreSlide 36Note:BMSearchThe Karp-Rabin Algorithm IdeaThe Karp-Rabin AlgorithmString SearchingCSCI 2720Spring 2007Eileen KraemerA common word processor facility is to search for a given word in a document. Generally, the problem is to search for occurrences of a short string in a long string. String SearchDo the first then do the other onetheHistory of String SearchThe brute force algorithm:invented in the dawn of computer historyre-invented many times, still commonKnuth & Pratt invented a better one in 1970invented independently by Morrispublished 1976 as “Knuth-Morris-Pratt”Boyer & Moore found a better one before 1976found independently by GosperKarp & Rabin found a “better” one in 1980The obvious algorithm is to try the word at each possible place, and compare all the characters:for i := 0 to n-m do (doc length n)for j := 0 to m-1 do(word length m)compare word[j] with doc[i+j]if not equal, exit the inner loopThe complexity is at worst O(m*n) and best O(n).Improving String SearchSurprisingly, there is a faster algorithm where you compare the last characters first:Do the first then do the other onethecompare ‘e’ with ‘ ‘, fail so move along 3 placesDo the first then do the other onethecan only move along 2 placesImproved string search, continuedIn every case where the document character is not one of the characters in the word, we can move along m places. Sometimes, it is less.Problem Definition, terminologyLet p be the pattern stringLet t be the target stringLet k be the index of the character in the target string that “lies over” the first character of the patternGiven two strings, p and t, over the alphabet , determine whether p occurs as the substring of tThat is, determine whether there exists k such that p=Substring(t,k,|p|).Straightforward string searchingfunction SimpleStringSearch(string p,t): i nteger{Find p in t; return its location or -1 if p is not a substring of t} for k from 0 to Length(t) – Length(p) doi <- 0 while i < Length(p) and p[i] = t[k+i] doi <- i+1if i == Length(p) then return k return -1SimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C Dp[0] p[1] p[2] p[3] Y Y Y NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] NSimpleStringSearcht[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E A B C D p[0] p[1] p[2] p[3] Y Y Y YStraightforward string searchingWorst case:Pattern string always matches completely except for last characterExample: search for XXXXXXY in target string of XXXXXXXXXXXXXXXXXXXXOuter loop executed once for every character in target stringInner loop executed once for every character in pattern(|p| * |t|)Okay if patterns are short, but better algorithms existKnuth-Morris-Pratt(|p| * |t|)Key idea: if pattern fails to match, slide pattern to right by as many boxes as possible without permitting a match to go unnoticedKnuth-Morris-Prattt[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]X Y X Y X Y c p[0] p[1] p[2] p[3] p[4] Y Y Y YX Y X Y Z NX Y X Y Z Y Y Y Y ?Knuth-Morris PrattCorrect motion of pattern depends on both location of mismatch and the mismatching characterIf c == X : move 2 boxes to rightIf c == E : move 5 boxes to rightIf c == Z : target found; alg terminatesKnuth-Morris-PrattGoal: determine d, number of boxes to right pattern should move; smallest d such that:p[0] = t[k+d]p[1] = t[k+d+1]p[2] = t[k+d+2]…p[i-d] = t[k+i]Knuth-Morris-PrattNote: can be stated largely in terms of pattern alone.Value of d depends only on:The patternThe value of iThe mismatching character c (at t[k+i])Knuth-Morris-PrattCan define a function KMPskip(p,i,c) to give correct d Return smallest integer d such that 0 <= d <=I, such that p[i-d] == c and p[j] == p[j+d] for each 0 <=j <= i-di1Return i+1 if no such d existsCalculate all values of KMPskip for pattern p and store it in KMPskiparraydo lookup at each mismatchKnuth-Morris-PrattFor pattern ABCD:0 1 2 31 0 3 41 2 0 41 2 3 01 2 3 4A B C D A B C D otherKnuth-Morris-PrattFor pattern XYXYZ:0 1 0 3 21 0 3 0 51 2 3 4 01 2 3 4 5 X Y X Y Z X Y Z otherKnuth-Morris-PrattFunction KMPSearch(string p, t): integer{Find p in t; return its location or -1 if p is not a substring of t}KMPskiparray <- ComputeKMPskiparray(p)k <- 0i <- 0While k < Length(t) – Length(p) doif i == Length(p) then return kd <- KMPskiparray[I,t[k+i]]k <- k + di <- I


View Full Document

UGA CSCI 2720 - stringsearch

Documents in this Course
Load more
Download stringsearch
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view stringsearch and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view stringsearch 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?