DOC PREVIEW
UT CS 337 - String Matching- Boyer-Moore Algorithm

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

String Matching: Boyer-Moore AlgorithmGreg PlaxtonTheory in Programming Practice, Fall 2005Department of Computer ScienceUniversity of Texas at AustinThe (Exact) String Matching Problem• Given a text string t and a pattern string p, find all occurrences of p intTheory in Programming Practice, Plaxton, Fall 2005Three Efficient String Matching Algorithms• Rabin-Karp– This is a simple randomized algorithm that tends to run in lineartime in most scenarios of practical interest– The worst case running time is as bad as that of the naive algorithm,i.e., Θ(p · t)• Knuth-Morris-Pratt– The worst case running time of this algorithm is linear, i.e., O(p + t)• Boyer-Moore (this lecture and the next)– This algorithm tends to have the best performance in practice, as itoften runs in sublinear time– The worst case running time is as bad as that of the naive algorithmTheory in Programming Practice, Plaxton, Fall 2005Boyer-Moore String Matching Algorithm• At any moment, imagine that the pattern is aligned with a portion ofthe text of the same length, though only a part of the aligned text mayhave been matched with the pattern• Henceforth, alignment refers to the substring of t that is aligned withp and l is the index of the left end of the alignment; i.e., p[0] is alignedwith t[l] and, in general, p[i], 0 ≤ i < m, with t[l + i]• Whenever there is a mismatch, the pattern is shifted to the right, i.e.,l is increased, as explained in the following sectionsTheory in Programming Practice, Plaxton, Fall 2005Algorithm Outline• The overall structure of the program is a loop that has the invariantQ1: Every occurrence of p in t that starts before l has been recorded• The following loop records every occurrence of p in t eventuallyl := 0;{ Q1 }loop{ Q1 }“increase l while preserving Q1”endloopTheory in Programming Practice, Plaxton, Fall 2005The Variable j• Next, we show how to increase l while preserving Q1• We introduce variable j, 0 ≤ j < m, with the meaning that the suffixof p starting at position j matches the corresponding portion of thealignmentQ2: 0 ≤ j ≤ m, p[j..m] = t[l + j..l + m]• Thus, the whole pattern is matched when j = 0, and no part has beenmatched when j = mTheory in Programming Practice, Plaxton, Fall 2005A Refinement of the Previous Algorithm• We establish Q2 by setting j to m• We match the symbols from right to left of the pattern until we find amismatch or the whole pattern is matchedj := m;{ Q2 }while j > 0 ∧ p[j − 1] = t[l + j − 1] do j := j − 1 endwhile{ Q1 ∧ Q2 ∧ (j = 0 ∨ p[j − 1] 6= t[l + j − 1]) }if j = 0then { Q1 ∧ Q2 ∧ j = 0 } record a match at l; l := l0{ Q1 }else { Q1 ∧ Q2 ∧ j > 0 ∧ p[j − 1] 6= t[l + j − 1] } l := l00{ Q1 }endif{ Q1 }• How do we compute l0and l00?Theory in Programming Practice, Plaxton, Fall 2005Computation of l0• This turns out to be essentially a special case of the computation of l00• So we focus primarily on the computation of l00in the presentation thatfollowsTheory in Programming Practice, Plaxton, Fall 2005Computation of l00• The precondition for the computation of l00is,Q1 ∧ Q2 ∧ j > 0 ∧ p[j − 1] 6= t[l + j − 1].• We consider two heuristics, each of which can be used to calculate avalue of l00; the greater value is assigned to l– The first heuristic, called the bad symbol heuristic, exploits the factthat we have a mismatch at position j − 1 of the pattern– The second heuristic, called the good suffix heuristic, uses the factthat we have matched a (possibly empty) suffix of p with the suffixof the alignment, i.e., p[j..m] = t[l + j..l + m].Theory in Programming Practice, Plaxton, Fall 2005The Bad Symbol Heuristic: Easy Case• Suppose we have the pattern “attendance” that we have aligned againsta portion of the text whose suffix is “hce”, as shown belowtext - - - - - - - h c epattern a t t e n d a n c ealign a t t e n d a n c e• The suffix “ce” has been matched; the symbols ’h’ and ’n’ do notmatch• There is no ’h’ in the pattern, so no match can include this ’h’ of thetext• Hence, the pattern may be shifted to the symbol following ’h’ in thetext, as shown by align aboveTheory in Programming Practice, Plaxton, Fall 2005The Bad Symbol Heuristic: The More Interesting Case• Next, suppose the mismatched symbol in the text is ’t’, as shown belowtext - - - - - - - t c epattern a t t e n d a n c e• There are two ways to align the ’t’ in the pattern with a ’t’ in the texttext - - t c e - - - - - -align1 a t t e n d a n c ealign2 a t t e n d a n c e• Which alignment should we choose?Theory in Programming Practice, Plaxton, Fall 2005Minimum Shift Rule• Rule: Shift the pattern by the minimum allowable amount• Justification: Preservation of Q1– We never skip over a possible match following this rule, because nosmaller shift yields a match at the given position, and, hence no fullmatch• So, in the example of the previous slide, we should use align1Theory in Programming Practice, Plaxton, Fall 2005Motivation for the Minimum Shift Rule: Example• In this example, the leftmost symbol ’y’ of the pattern “xxy” fails tomatch the text symbol ’x’text - - x - -pattern x x yalign1 x x yalign2 x x y• If we were to advance to alignment align2 , we might skip a match inposition in align1 , violating invariant Q1Theory in Programming Practice, Plaxton, Fall 2005Bad Symbol Heuristic: Implementation• For each symbol in the alphabet, we precalculate its rightmost positionin the pattern• if the mismatched symbol’s rightmost occurrence in the pattern is atp[k], then p[0] is aligned with t[l − k + j − 1], or l is increased by−k + j − 1• For a nonexistent symbol in the pattern, like ’h’, we set its rightmostoccurrence to −1 so that l is increased to l + j, as required• Note that the shift −k + j − 1 is negative if k > j − 1, which can easilyoccur– But the good suffix heuristic always yields a positive increment for l,so the maximum of these two increments is positiveTheory in Programming Practice, Plaxton, Fall 2005The Good Suffix Heuristic• Suppose we have a pattern “abxabyab” of which we have alreadymatched the suffix “ab”, but there is a mismatch with the precedingsymbol ’y’, as shown belowtext - - - - - z a b - -pattern a b x a b y a b• Then, we shift the pattern to the right so that the matched part isoccupied by the same symbols, “ab”; this


View Full Document

UT CS 337 - String Matching- Boyer-Moore Algorithm

Documents in this Course
Load more
Download String Matching- Boyer-Moore Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view String Matching- Boyer-Moore Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view String Matching- Boyer-Moore Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?