DOC PREVIEW
Princeton COS 226 - Substring search

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Algorithms, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2002–2012·April 18, 2012 6:10:29 AMAlgorithmsFOUR T H EDIT IONR O B E R T S EDG EWICK K EVIN W A Y N E5.3 SUBSTRING SEARCH‣brute force‣Knuth-Morris-Pratt‣Boyer-Moore‣Rabin-Karp2Substring searchGoal. Find pattern of length M in a text of length N.typically N >> MSubstring search N E E D L EI N A H A Y S T A C K N E E D L E I N Amatchpatter ntext3Substring search applicationsGoal. Find pattern of length M in a text of length N.Computer forensics. !Search memory or disk for signatures,e.g., all URLs or RSA keys that the user has entered.typically N >> Mhttp://citp.princeton.edu/memorySubstring search N E E D L EI N A H A Y S T A C K N E E D L E I N Amatchpatter ntext4Substring search applicationsGoal. Find pattern of length M in a text of length N.Identify patterns indicative of spam. • PROFITS • L0SE WE1GHT • herbal Viagra • There is no catch. • This is a one-time mailing. • This message is sent in compliance with spam regulations. typically N >> MSubstring search N E E D L EI N A H A Y S T A C K N E E D L E I N Amatchpatter ntext5Substring search applicationsElectronic surveillance.Need to monitor all internet traffic.(security)No way!(privacy)Well, we’re mainlyinterested in“ATTACK AT DAWN”OK. Build amachine that just looks for that.“ATTACK AT DAWN”substring searchmachinefound6Substring search applicationsScreen scraping. Extract relevant data from web page.Ex. Find string delimited by <b> and </b> after first occurrence ofpattern Last Trade:.http://finance.yahoo.com/q?s=goog...<tr><td class= "yfnc_tablehead1"width= "48%">Last Trade:</td><td class= "yfnc_tabledata1"><big><b>452.92</b></big></td></tr><td class= "yfnc_tablehead1"width= "48%">Trade Time:</td><td class= "yfnc_tabledata1">...7Screen scraping: Java implementationJava library. The indexOf() method in Java's string library returns the index of the first occurrence of a given string, starting at a given offset.public class StockQuote{ public static void main(String[] args) { String name = "http://finance.yahoo.com/q?s="; In in = new In(name + args[0]); String text = in.readAll(); int start = text.indexOf("Last Trade:", 0); int from = text.indexOf("<b>", start); int to = text.indexOf("</b>", from); String price = text.substring(from + 3, to); StdOut.println(price); } }% java StockQuote goog582.93% java StockQuote msft24.848‣brute force‣Knuth-Morris-Pratt‣Boyer-Moore‣Rabin-KarpCheck for pattern starting at each text position.9Brute-force substring searchBrute-force substring search i j i+j 0 1 2 3 4 5 6 7 8 9 10 A B A C A D A B R A C 0 2 2 A B R A 1 0 1 A B R A 2 1 3 A B R A 3 0 3 A B R A 4 1 5 A B R A 5 0 5 A B R A 6 4 10 A B R A entries in gray arefor reference onlyentries in blackmatch the textreturn i when j is Mentries in red aremismatchestxtpatmatchCheck for pattern starting at each text position.public static int search(String pat, String txt){ int M = pat.length(); int N = txt.length(); for (int i = 0; i <= N - M; i++) { int j; for (j = 0; j < M; j++) if (txt.charAt(i+j) != pat.charAt(j)) break; if (j == M) return i; } return N;}10Brute-force substring search: Java implementationindex in text wherepattern startsnot foundi j i+j 0 1 2 3 4 5 6 7 8 9 10 A B A C A D A B R A C4 3 7 A D A C R5 0 5 A D A C RBrute-force algorithm can be slow if text and pattern are repetitive.Worst case. ~ M N char compares.11Brute-force substring search: worst caseBrute-force substring search (worst case) i j i+j 0 1 2 3 4 5 6 7 8 9 A A A A A A A A A B 0 4 4 A A A A B 1 4 5 A A A A B 2 4 6 A A A A B 3 4 7 A A A A B 4 4 8 A A A A B 5 5 10 A A A A B txtpatBrute-force substring search i j i+j 0 1 2 3 4 5 6 7 8 9 10 A B A C A D A B R A C 0 2 2 A B R A 1 0 1 A B R A 2 1 3 A B R A 3 0 3 A B R A 4 1 5 A B R A 5 0 5 A B R A 6 4 10 A B R A entries in gray arefor reference onlyentries in blackmatch the textreturn i when j is Mentries in red aremismatchestxtpatmatchIn many applications, we want to avoid backup in text stream.•Treat input as stream of data.•Abstract model: standard input.Brute-force algorithm needs backup for every mismatch.Approach 1. Maintain buffer of last M characters.Approach 2. Stay tuned.A A A A A A A A A A A A A A A A A A A A A B A A A A A BBackup12“ATTACK AT DAWN”substring searchmachinefoundA A A A A A A A A A A A A A A A A A A A A B A A A A A Bmatched charsmismatchshift pattern right one positionbackupSame sequence of char compares as previous implementation.• i points to end of sequence of already-matched chars in text.• j stores number of already-matched chars (end of sequence in pattern).public static int search(String pat, String txt){ int i, N = txt.length(); int j, M = pat.length(); for (i = 0, j = 0; i < N && j < M; i++) { if (txt.charAt(i) == pat.charAt(j)) j++; else { i -= j; j = 0; } } if (j == M) return i - M; else return N;}13Brute-force substring search: alternate implementationbackupi j 0 1 2 3 4 5 6 7 8 9 10 A B A C A D A B R A C7 3 A D A C R5 0 A D A C R14Algorithmic challenges in substring searchBrute-force is not always good enough.Theoretical challenge. Linear-time guarantee.Practical challenge. Avoid backup in text stream.often no room or time to save textfundamental algorithmic problemNow is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of


View Full Document

Princeton COS 226 - Substring search

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Substring search
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Substring search and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Substring search 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?