DOC PREVIEW
Princeton COS 226 - String Search

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Brute forceRabin-KarpKnuth-Morris-PrattRight-Left scanString Search2Text with N charactersPattern with M characters•match existence: any occurence of pattern in text?•enumerate: how many occurences?•match: return index of any occurence•all matches: return indices of all occurences••••••••Assume that N >> M >> number of occurencesEx: N = 100,000; M = 100; five occurencesString Searchingfocus of this lectureSample problem: find avoctdfytvv in kvjlixapejrbxeenpphkhthbkwyrwamnugzhppfxiyjyanhapfwbghx mshrlyujfjhrsovkvveylnbxnawavgizyvmfohigeabgksfnbkmffxj ffqbualeytqrphyrbjqdjqavctgxjifqgfgydhoiwhrvwqbxgrixydz bpajnhopvlamhhfavoctdfytvvggikngkwzixgjtlxkozjlefilbrboi gnbzsudssvqymnapbpqvlubdoyxkkwhcoudvtkmikansgsutdjythzl apawlvliygjkmxorzeoafeoffbfxuhkzukeftnrfmocylculksedgrd ivayjpgkrtedehwhrvvbbltdkctq3Find M-char pattern in N-char textApplications•word processors•virus scanning•text information retrieval (ex: Lexis/Nexis)•digital libraries•computational biology•web search enginesTheoretical challenge: linear-time guarantee •suffix-trie index costs ~NlgNPractical challenge: avoid BACKUP •often no room or time to save input charsFundamental algorithmic problemString searching contextNow is the time for all people to come to the aid of their party.Now is the time for all good people to come to the aid of theirparty.Now is the time for manygood people to come to the aid of their party.Now is the time for allgood people to come to the aid of theirparty.Now is the time for a lot ofgood people to come to the aid of theirparty.Now is the time for all of thegood people to come to the aid of theirparty.Now is the time for all good people to come to the aid of theirparty. Now is the time for each goodperson to come to the aid of theirparty.Now is the time for all good people to come to the aid of theirparty. Now is the time for allgood Republicans to come to the aid of their party.Now is the time for allgood people to come to the aid of theirparty. Now is the time for many or allgood people to come to the aid of theirparty. Now is the time for all good people to come to the aid of theirparty.Now is the time for all good Democrats to come to the aid of theirparty. Now is the time for all people to come to the aid of their party.Now is the time for all good people to come to the aid of theirparty.Now is the time for manygood people to come to the aid of their party.Now is the time for allgood people to come to the aid of theirparty.Now is the time for a lot ofgood people to come to the aid of theirparty.Now is the time for all of thegood people to come to the aid of theirparty.Now is the time for all good people to come to the aid of theirattack at dawn party. Now is the time for eachperson to come to the aid of theirparty.Now is the time for all good people to come to the aid of theirparty. Now is the time for allgood Republicans to come to the aid of their party.Now is the time for allgood people to come to the aid of theirparty. Now is the time for many or allgood people to come to the aid of theirparty. Now is the time for all good people to come to the aid of theirparty.Now is the time for all good Democrats to come to the aid of theirparty.4Random pattern or text??••Simple, effective algorithm: return “NOT FOUND”•probability of match is less than N/(alphabet size)MBetter to model fixed pattern in fixed text at random position•swap patterns in sample problems 1 and 2 makes both OK•use random perturbations to test mismatchModelling String SearchingSample problem 1: find unwillingly in kvjlixapejrbxeenpphkhthbkwyrwamnugzhppfxiyjyanhapfwbghx mshrlyujfjhrsovkvveylnbxnawavgizyvmfohigeabgksfnbkmffxj ffqbualeytqrphyrbjqdjqavctgxjifqgfgydhoiwhrvwqbxgrixydz bpajnhopvlamhhfavoctdfytvvggikngkwzixgjtlxkozjlefilbrboi gnbzsudssvqymnapbpqvlubdoyxkkwhcoudvtkmikansgsutdjythzlSample problem 2: find avoctdfytvv in all the world’s a stage and all the men and women merely players. They have their exits and their entrances, and one man in his time plays many parts. At first, the infant, mewling and puking in the nurse’s arms. Then the whining schoolboy, with his satchel and shining morning face, creeping like snail unwillingly to school. And then the lover, sighing.0000000000000084 for sample problemsrandom textrandom pattern5Brute-force string searchingtext loopusing array to simplify alg descriptionsonline apps need getchar()-based implementationpattern loopmatchmismatchCheck for pattern at every text positionint brutesearch(char p[], char a[]) { int i, j; for (i = 0; i < strlen(a); i++) for (j = 0; j < strlen(p); j++) if (a[i+j] != p[j]) break; if (j == strlen(p)) return i; return strlen(a); }•returns i if leftmost pattern occurence starts at a[i]•returns N if no matchDO NOT USE THIS PROGRAM!6 for (i = 0; i < strlen(a); i++)In C, strlen takes time proportional to string length•evaluated every time through loop• running time is at least N2•same problem for simpler programs (ex: count the blanks)PERFORMANCE BUGTextbook example: Performance matters in ADT designExercise: implement string ADT with fast strlen•need space to store length•need to update length when changing string•might slow down some other simple programsProblem with brute-force implementation7Brute-force string searching (bug fixed)text looppattern looplengths won’t change; precompute themCheck for pattern at every text positionint brutesearch(char p[], char a[]) { int M = strlen(p), N = strlen(a); int i, j; for (i = 0; i < N; i++) for (j = 0; j < M; j++) if (a[i+j] != p[j]) break; if (j == M) return i; return N; }•returns i if leftmost pattern occurence starts at a[i]•returns N if no match8Brute-force typical casepattern: xkhthbkwytext: kvjlixkpejrbxeenppxkhthbkwy * * * * * xk* * * * * * * x* * * * * * xkhthbkwycharacter compares: N+3Can we guarantee performance?9Brute-force worst casepattern: 000000001text: 000000000000000000000000001 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 00000000* 000000001character compares: M*NToo slow when M and N are largeBacks up in text: need to retain M-char buffer10Idea 1: Use hashing•compute hash function for each text position•NO TABLE needed: just compare with pattern


View Full Document

Princeton COS 226 - String Search

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download String Search
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view String Search and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view String Search 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?