UT CS 337 - String Matching: Rabin-Karp Algorithm

Unformatted text preview:

String Matching: Rabin-Karp AlgorithmGreg PlaxtonTheory in Programming Practice, Fall 2005Department of Computer ScienceUniversity of Texas at AustinThe (Exact) String Matching Problem• The (exact) string matching problem: Given a text string t and apattern string p, find all occurrences of p in t• A naive algorithm for this problem simply considers all possible startingpositions i of a matching string within t, and compares p to thesubstring of t beginning at each such position i– The worst-case complexity of this algorithm is Θ(mn), where mdenotes the length of p and n denotes the length of t– Can we do better?Theory in Programming Practice, Plaxton, Fall 2005Three Efficient String Matching Algorithms• Rabin-Karp (today)– This is a simple randomized algorithm that tends to run in lineartime in most scenarios of practical interest– The worst case running time is as bad as that of the naive algorithm,i.e., Θ(mn)• Knuth-Morris-Pratt– The worst case running time of this algorithm is linear, i.e., O(m+n)• Boyer-Moore– This algorithm tends to have the best performance in practice, as itoften runs in sublinear time– The worst case running time is as bad as that of the naive algorithmTheory in Programming Practice, Plaxton, Fall 2005The Rabin-Karp String Matching Algorithm• Assume the text string t is of length m and the pattern string p is oflength n• Let sidenote the length-n contiguous substring of t beginning at offseti ≥ 0– So, for example, s0is the length-n prefix of t• The main idea is to use a hash function h to map each sito a good-sized set such as the set of the first k nonnegative integers, for somesuitable k– Initially, we compute h(p)– Whenever we encounter an i for which h(si) = h(p), we check for amatch as in the naive algorithm– If h(si) 6= h(p), we don’t need to check for a matchTheory in Programming Practice, Plaxton, Fall 2005The Choice of Hash Function• It should be easy to compare two hash values– For example, if the range of the hash function is a set of sufficientlysmall nonnegative integers, then two hash values can be comparedwith a single machine instruction• The number of false positives induced by the hash function should besimilar to that achieved by a “random” function– If the range of the hash function is of size k, we’d like each hashvalue to be achieved by approximately the same number of n-symbolstrings (where n is the length of the pattern)• It should be easy (e.g., a constant number of machine instructions) tocompute h(si+1) given h(si)Theory in Programming Practice, Plaxton, Fall 2005A Possible Choice for the Hash Function• Suppose we hash each string to the XOR of the ASCII values of itscharacters– Is this a good choice of hash function with respect to the criteriamentioned on the previous slide?• What if we hash each string to the sum of the ASCII values of itscharacters?• What if we view each string as a nonnegative number?– For example, an ASCII string may be viewed as a base 256 number– Alternatively, an n-symbol ASCII string may be viewed as an (8n)-bitnumberTheory in Programming Practice, Plaxton, Fall 2005A Good Choice for the Hash Function• View each string as a nonnegative number, but take the result modulok for some suitable modulus k• For example, we might take k to be 232, to ensure that the hash valuescan be stored in a 32-bit integer• In practice the modulus k is generally taken to be a prime (e.g., a32-bit prime) in order to better destroy any structure in the input data– For example, note that the 8-bit ASCII codes for printable charactersall begin with a 0– So if we use k = 232, bits 7, 15, 23, and 31 of the hash of a printablestring are guaranteed to be zero• But can we still compute h(si+1) from h(si) efficiently?Theory in Programming Practice, Plaxton, Fall


View Full Document

UT CS 337 - String Matching: Rabin-Karp Algorithm

Documents in this Course
Load more
Download String Matching: Rabin-Karp Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view String Matching: Rabin-Karp Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view String Matching: Rabin-Karp Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?