UT CS 337 - String Matching: Boyer-Moore Algorithm - D105436

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 337> String Matching: Boyer-Moore Algorithm

DOC PREVIEW

UT CS 337 - String Matching: Boyer-Moore Algorithm

School name University of Texas at Austin

Course Cs 337- Theory in Programming Practice

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

String Matching: Boyer-Moore AlgorithmGreg PlaxtonTheory in Programming Practice, Spring 2005Department of Computer ScienceUniversity of Texas at AustinNotation• We abbreviate min{p − r | r ∈ R} as min(p − R)• In general, if S is a set of strings and e(S) an expression that includes Sas a term, then min(e(S)) = min{e(i) | i ∈ S}, where e(i) is obtainedfrom e by replacing S by i• We adopt the convention that the minimum of the empty set is ∞Theory in Programming Practice, Plaxton, Spring 2005Basic Definitions• Let R denote R0∪ R00, where R0is{r is a proper prefix of p ∧ r is a suffix of s}and R00is{r is a proper prefix of p ∧ s is a suffix of r}• Recall thatb(s) = min{p − r | r ∈ R}• Thusb(s) = min(min(p − R0), min(p − R00))Theory in Programming Practice, Plaxton, Spring 2005Properties of b(s)• P1: c(p) ∈ R• P2: min(p − R0) ≥ p − c(p)• P3: IfV = {v | v is a suffix of p ∧ c(v) = s}then min(p − R00) = min(V − s)Theory in Programming Practice, Plaxton, Spring 2005Proof of Property P1• P1: c(p) ∈ R• From the definition of core, c(p) ≺ p• Hence, c(p) is a proper prefix of p• Also, c(p) is a suffix of p, and, since s is a suffix of p, they are totallyordered, i.e., either c(p) is a suffix of s or s is a suffix of c(p)• Hence, c(p) ∈ RTheory in Programming Practice, Plaxton, Spring 2005Proof of Property P2• P2: min(p − R0) ≥ p − c(p)• Consider any r in R0• Since r is a suffix of s and s is a suffix of p, r is a suffix of p• Also, r is a proper prefix of p, so r ≺ p• From the definition of core, r ¹ c(p), and hence p − r ≥ p − c(p) forevery r in R0Theory in Programming Practice, Plaxton, Spring 2005Proof of Property P3• P3: IfV = {v | v is a suffix of p ∧ c(v) = s}then min(p − R00) = min(V − s)• We split the proof into two parts:– First, we show that min(p − R00) ≤ min(V − s)– Then, we show that min(p − R00) ≥ min(V − s)Theory in Programming Practice, Plaxton, Spring 2005Proof that min(p − R00) ≤ min(V − s)• If V is empty, the inequality holds since the RHS is ∞; in what follows,assume that V is nonempty and let v be an arbitrary element of V• It is sufficient to exhibit an r in R00such that p − r = v − s• Let r be the length-(p − v + s) prefix of p– Note that r is a proper prefix of p since c(v) = s implies v > s– Furthermore, s is a suffix of r since c(v) = s implies that s is a prefixof v– So r belongs to R00, as requiredTheory in Programming Practice, Plaxton, Spring 2005Proof that min(p − R00) ≥ min(V − s)• If R00is empty, the inequality holds since the LHS is ∞; in what follows,assume that R00is nonempty and let r be the string in R00minimizingthe LHS• It is sufficient to exhibit a v in V such that p − r = v − s• Let v denote the length-(p − r + s) suffix of p– Note that v > s since r is a proper prefix of p– Furthermore, s ≺ v, so s ¹ c(v)– If s ≺ c(v), then we obtain a contradiction to the definition of rsince the length-(r + c(v) − s) prefix r0of p also belongs to R00andyields a smaller value for the LHS– Thus s = c(v) and hence v belongs to V , as requiredTheory in Programming Practice, Plaxton, Spring 2005A Formula for b(s)• We now derive a formula for b(s), whereV = {v | v is a suffix of p ∧ c(v) = s}b(s)= {definition of b(s)}min(p − R)= {from (P1): c(p) ∈ R }min(p − c(p), min(p − R))= {R = R0∪ R00}min(p − c(p), min(p − R0), min(p − R00))= {from (P2): min(p − R0) ≥ p − c(p)}min(p − c(p), min(p − R00))= {from (P3): min(p − R00) = min(V − s)}min(p − c(p), min(V − s))Theory in Programming Practice, Plaxton, Spring 2005Computation of b: Towards An Abstract Program• We now develop an abstract program to compute b(s), for all suffixess of p• We employ an array b where b[s] ultimately holds the value of b(s),though it is assigned different values during the computation• Initially, we set b[s] to p − c(p)• Next, for each suffix v of p (in arbitrary order)– Let s = c(v)– Update b[s] to min(b[s], v − s)Theory in Programming Practice, Plaxton, Spring 2005Computation of b: An Abstract Program• Here is our abstract program for computing b(s) for all suffixes s of passign p − c(p) to all elements of b;for all suffixes v of p dos := c(v);if b[s] > v − s then b[s] := v − s endifendforTheory in Programming Practice, Plaxton, Spring 2005Computation of b: Towards a Concrete Program• The goal of the concrete program is to compute an array e, where e[j]is the amount by which the pattern is to be shifted when the matchedsuffix is p[j..p], 0 ≤ j ≤ p– e[j] = b[s], where j + s = p, or– e[p − s] = b[s], for any suffix s of p• We have no need to keep explicit prefixes and suffixes; instead, we keeptheir lengths, s in i and v in j• Let array f hold the lengths of the cores of all suffixes of p suffixes vof p, i.e., f[v] = c(v)Theory in Programming Practice, Plaxton, Spring 2005Computation of b: A Concrete Program• Here is our concrete program for computing b(s) for all suffixes s of passign p − c(p) to all elements of e;for j, 0 ≤ j ≤ p, doi := f[j];if e[p − i] > j − i then e[p − i] := j − i endifendfor• It remains to compute fTheory in Programming Practice, Plaxton, Spring 2005Computation of f• Here we are asked to compute the (length of the) core of every suffixof p• Recall that the preprocessing phase of the KMP algorithm computesthe core of every prefix of p in O(p) time• A symmetric approach can be used to compute the core of every suffixof p in O(p) timeTheory in Programming Practice, Plaxton, Spring 2005Computation of b: Time Complexity• The computation of b(s), for all suffixes s of p, requires computingarray f and executing the concrete program presented earlier– Note that c(p) = f[p]• As we have indicated on the previous slide, the array f can be computedin O(p) time• Given f, the concrete program runs in O(p) time since the loop iteratesO(p) times, and each execution of the loop body takes constant timeTheory in Programming Practice, Plaxton, Spring

View Full Document