DOC PREVIEW
Princeton COS 226 - Pattern Matching

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Algorithms in Java, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2008·April 14, 2008 9:38:41 AMPattern MatchingReferences: Algorithms in C (2nd edition), Chapter 19 (pdf online) http://www.cs.princeton.edu/algs4/63long http://www.cs.princeton.edu/algs4/72regular‣exact pattern matching‣Knuth-Morris-Pratt‣RE pattern matching‣grep2‣exact pattern matching‣Knuth-Morris-Pratt‣RE pattern matching‣grep3Exact pattern matchingGoal. Find pattern of length M in a text stream of length N.Computer forensics. !Search memory or disk for signatures,e.g., all URLs or RSA keys that the user has entered.typically N >> Mpatterntextneedleinahaystackaneedleinahttp://citp.princeton.edu/memory4Applications•Parsers.•Spam filters. •Digital libraries.•Screen scrapers.•Word processors.•Web search engines.•Natural language processing.•Computational molecular biology.•Feature detection in digitized images.5Spam filteringIdentify patterns indicative of spam.•PROFITS •AMAZING •GUARANTEE •L0SE WE1GHT •herbal Viagra •There is no catch. •L0W M0RTGAGE RATES •This is a one-time mailing. •This message is sent in compliance with spam regulations. •You're getting this message because you registered with one of our marketing partners.6Screen scrapingGoal. Extract relevant data from web page.Ex. Find string delimited by <b> and </b> after first occurrence ofpattern Last Trade:.http://finance.yahoo.com/q?s=goog...<tr><td class= "yfnc_tablehead1"width= "48%">Last Trade:</td><td class= "yfnc_tabledata1"><big><b>452.92</b></big></td></tr><td class= "yfnc_tablehead1"width= "48%">Trade Time:</td><td class= "yfnc_tabledata1">...7Exact pattern matching in JavaThe method s.indexOf(pattern, offset) in Java's String library returns the index of the first occurrence of pattern in string s, starting at given offset.public class StockQuote{ public static void main(String[] args) { String name = "http://finance.yahoo.com/q?s="; In in = new In(name + args[0]); String input = in.readAll(); int start = input.indexOf("Last Trade:", 0); int from = input.indexOf("<b>", start); int to = input.indexOf("</b>", from); String price = input.substring(from + 3, to); StdOut.println(price); } }% java StockQuote goog452.92% java StockQuote msft28.152Check for pattern starting at each text position.8Brute-force exact pattern matchh a y n e e d s a n n e e d l e xn e e d l en e e d l en e e d l en e e d l en e e d l en e e d l en e e d l en e e d l en e e d l en e e d l en e e d l eCheck for pattern starting at each text position.public static int search(String pattern, String text){ int M = pattern.length(); int N = text.length(); for (int i = 0; i < N - M; i++) { int j; for (j = 0; j < M; j++) if (text.charAt(i+j) != pattern.charAt(j)) break; if (j == M) return i; } return -1; }9Brute-force exact pattern match: Java implementationindex in text where pattern startsnot foundBrute-force algorithm can be slow if text and pattern are repetitive.Worst case. ~ MN char compares.10Brute-force exact pattern match: worst casea a a a a a a a a a a a a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a ba a a a a b11Algorithmic challenges in pattern matching Brute-force is not good enough for all applications.Theoretical challenge. Linear-time guarantee.Practical challenge. Avoid backup in text stream.often no room or time to save textfundamental algorithmic problemNow is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their attack at dawn party. Now is the time for each person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party.12‣exact pattern matching‣Knuth-Morris-Pratt‣RE pattern matching‣grepKnuth-Morris-Pratt exact pattern-matching algorithmKMP. Classic algorithm that meets both challenges.•Linear-time guarantee.•No backup in text stream.Basic plan (for binary alphabet).•Build DFA from pattern.•Simulate DFA with text as input.13Don KnuthVaughan PrattJim MorrisDFAforpatterna a b a a acceptpattern in textrejectpattern NOT in texttexta a a b a a b a a a bDFA review.•Finite number of states (including start and accept).•Exactly one transition for each input symbol.•Accept if sequence of transitions leads to accept state.Q. Which bitstrings does this DFA accept?14Deterministic finite-state automataOne state for each pattern character.•Match input character: move from i to i+1.•Mismatch: move to previous state.Knuth-Morris-Pratt


View Full Document

Princeton COS 226 - Pattern Matching

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Pattern Matching
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pattern Matching and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pattern Matching 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?