DOC PREVIEW
Princeton COS 226 - Strings

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos2266. Strings2StringsString. Ordered list of characters.Ex. Natural languages, Java programs, genomic sequences, …."The digital information that underlies biochemistry, cell biology, anddevelopment can be represented by a simple string G's, A's, T's and C's.This string is the root data structure of an organism's biology." -M. V. Olson3Using Strings in JavaString concatenation. Append one string to end of another string.Substring. Extract a contiguous list of characters from a string.String s = "strings"; // s = "STRINGS"char c = s.charAt(2); // c = 'R'String t = s.substring(2, 7); // t = "RINGS"String u = s + t; // u = "STRINGSRINGS"S T R I N G S0 1 2 3 4 5 64String Implementation in JavaMemory. 28 + 2N bytes for virgin string!public final class String implements Comparable<String> { private char[] value; // characters private int offset; // index of first char into array private int count; // length of string private int hash; // cache of hashCode private String(int offset, int count, char[] value) { this.offset = offset; this.count = count; this.value = value; } public String substring(int from, int to) { return new String(offset + from, to - from, value); } . . .}could use byte array instead of String to save space5String vs. StringBuilderString. [immutable] Fast substring, slow concatenation.StringBuilder. [mutable] Slow substring, fast append.public static String reverse(String s) { String r = ""; for (int i = s.length() - 1; i >= 0; i--) r += s.charAt(i); return r;}quadratic timepublic static String reverse(String s) { StringBuilder r = new StringBuilder(""); for (int i = s.length() - 1; i >= 0; i--) r.append(s.charAt(i)); return r.toString();}linear timeRobert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos226Radix SortingReference: Chapter 13, Algorithms in Java, 3rd Edition, Robert Sedgewick.7Radix SortingRadix sorting.! Specialized sorting solution for strings.! Same ideas for bits, digits, etc.Applications.! Sorting strings.! Full text indexing.! Plagiarism detection.! Burrows-Wheeler transform. [see data compression]! Computational molecular biology.8An Application: Redundancy DetectorLongest repeated substring.! Given a string of N characters, find the longest repeated substring.! Ex: a a c a a g t t t a c a a g c! Application: computational molecular biology.Dumb brute force.! Try all indices i and j, and all match lengths k, and check.! O(W N3 ) time, where W is length of longest match.a a c a a g t t t a c a a g cijk k9An Application: Redundancy DetectorLongest repeated substring.! Given a string of N characters, find the longest repeated substring.! Ex: a a c a a g t t t a c a a g c! Application: computational molecular biology.Brute force.! Try all indices i and j for start of possible match, and check.! O(W N2 ) time, where W is length of longest match.ija a c a a g t t t a c a a g c10A Sorting SolutionSuffix sort.! Form N suffixes of original string.! Sort to bring longest repeated substrings together.a a c a a g t t t a c a a g ca c a a g t t t a c a a g cc a a g t t t a c a a g ca a g t t t a c a a g ca g t t t a c a a g cg t t t a c a a g ct t t a c a a g ct t a c a a g ct a c a a g ca c a a g cc a a g ca a g ca g cg cca a c a a g t t t a c a a g ca c a a g t t t a c a a g cc a a g t t t a c a a g ca a g t t t a c a a g ca g t t t a c a a g cg t t t a c a a g ct t t a c a a g ct t a c a a g ct a c a a g ca c a a g cc a a g ca a g ca g cg cc11String SortingNotation.! String = variable length sequence of characters.! W = max # characters per string.! N = # input strings.! R = radix.Java syntax.! Array of strings: String[] a! Number of strings: N = a.length! The ith string: a[i]! The dth character of the ith string: a[i].charAt(d)! Strings to be sorted: a[0], …, a[N-1]256 for extended ASCII, 65,536 for original UNICODE 12Suffix Sorting: Java ImplementationJava implementation.public class LRS { public static void main(String[] args) { String s = StdIn.readAll(); int N = s.length(); String[] suffixes = new String[N]; for (int i = 0; i < N; i++) suffixes[i] = s.substring(i, N); Arrays.sort(suffixes); System.out.println(lcp(suffixes)); }}read inputcreate suffixes(linear time)sort and find longest match(bottleneck)longest common prefix of adjacent strings% java LRS < mobydick.txt,- Such a funny, sporty, gamy, jesty, joky, hoky-poky lad, is the Ocean, oh! Th13String Sorting PerformanceQuicksort W N log N †9.5Worst Case Moby DickBrute W N236,000 §Suffix (sec)String SortW = max length of string.N = number of strings.1.2 million for Moby Dick§ estimate† probabilistic guarantee15Key Indexed CountingKey indexed counting.! Count frequencies of each letter. [0th character]d a ba d dc a bf a df e eb a dd a db e ef e db e de b ba c e01234567891011a 0b 2c 3d 1e 2f 1g 3int N = a.length;int[] count = new int[256+1];for (int i = 0; i < N; i++) { char c = a[i].charAt(d); count[c+1]++;}frequenciescountad = 016Key Indexed CountingKey indexed counting.! Count frequencies of each letter. [0th character]! Compute cumulative frequencies.for (int i = 1; i < 256; i++) count[i] += count[i-1]; cumulative countsa 0b 2c 5d 6e 8f 9gd a ba d dc a bf a df e eb a dd a db e ef e db e de b ba c e0123456789101112a counta 0b 2c 3d 1e 2f 1g 317Key Indexed CountingKey indexed counting.! Count frequencies of each letter. [0th character]! Compute cumulative frequencies.! Use cumulative frequencies to rearrange strings.a 0b 2c 5d 6e 8f 9a d da c eb a db e eb e dc a bd a bd a de b bf a df e ef e d01234567891011gd a ba d dc a bf a df e eb a dd a db e ef e db e de b ba c e0123456789101112for (int i = 0; i < N; i++) { char c = a[i].charAt(d); temp[count[c]++] = a[i];} rearrangetempa count30Key Indexed CountingKey indexed counting.! Count frequencies of each letter. [0th character]! Compute cumulative frequencies.! Use cumulative frequencies to rearrange strings.a 2b 5c 6d 8e 9f 1201234567891011a d da c eb a db e eb e dc a bd a bd a de b bf a df e ef e d01234567891011g 12for (int i = 0; i < N; i++) a[i] = temp[i]; copy backtempa counta d da c eb a db e eb e dc a bd a bd a de b bf a df e ef e d31LSD Radix SortLeast significant digit radix sort. Ancient method used for card-sorting.Lysergic Acid Diethylamide, Circa 1960 Card Sorter, Circa 196032LSD Radix


View Full Document

Princeton COS 226 - Strings

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Strings
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Strings and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Strings 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?