DOC PREVIEW
Princeton COS 226 - STRINGS

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Algorithms, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2002–2012·April 11, 2012 5:38:10 AMAlgorithmsFOUR T H EDIT IONR O B E R T S EDG EWICK K EVIN W A Y N E5. STRINGS‣5.1 Strings Sorts‣5.2 Tries‣5.3 Substring Search‣5.4 Regular Expressions‣5.5 Data Compression2String processingString. Sequence of characters.Important fundamental abstraction.•Information processing.•Genomic sequences.•Communication systems (e.g., email).•Programming systems (e.g., Java programs).•…“ The digital information that underlies biochemistry, cell biology, and development can be represented by a simple string of G's, A's, T's and C's. This string is the root data structure of an organism's biology. ” — M. V. Olson3The char data typeC char data type. Typically an 8-bit integer.•Supports 7-bit ASCII.•Need more bits to represent certain characters.Java char data type. A 16-bit unsigned integer.•Supports original 16-bit Unicode.•Supports 21-bit Unicode 3.0 (awkwardly).6676.5  Data CompressionASCII encoding. When you HexDump a bit-stream that contains ASCII-encoded charac-ters, the table at right is useful for reference. Given a 2-digit hex number, use the first hex digit as a row index and the second hex digit as a column reference to find the character that it encodes. For example, 31 encodes the digit 1, 4A encodes the letter J, and so forth. This table is for 7-bit ASCII, so the first hex digit must be 7 or less. Hex numbers starting with 0 and 1 (and the numbers 20 and 7F) correspond to non-printing control charac-ters. Many of the control characters are left over from the days when physical devices like typewriters were controlled by ASCII input; the table highlights a few that you might see in dumps. For example SP is the space character, NUL is the null character, LF is line-feed, and CR is carriage-return. In summary, working with data compression requires us to reorient our thinking about standard input and standard output to include binary encoding of data. BinaryStdIn and BinaryStdOut provide the methods that we need. They provide a way for you to make a clear distinction in your client programs between writing out information in-tended for file storage and data transmission (that will be read by programs) and print-ing information (that is likely to be read by humans). 0 1 2 3 4 5 6 7 8 9 A B C D E F0NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI1DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US2SP! “ # $ % & ‘ ( ) * + , - . /3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?4 @ A B C D E F G H I J K L M N O5 P Q R S T U V W X Y Z [ \ ] ^ _6 ` a b c d e f g h i j k l m n o7 p q r s t u v w x y z { | } ~DELHexadecimal to ASCII conversion tableString data type. Sequence of characters (immutable).Indexing. Get the ith character.Substring extraction. Get a contiguous sequence of characters from a string. String concatenation. Append one character to end of another string.4The String data typeFundamental constant-time String operations0 1 2 3 4 5 6 7 8 9 10 11 12A T T A C K A T D A W N ss.charAt(3)s.length()s.substring(7, 11)5The String data type: Java implementationpublic final class String implements Comparable<String>{ private char[] val; // characters private int offset; // index of first char in array private int length; // length of string private int hash; // cache of hashCode() public int length() { return length; } public char charAt(int i) { return value[i + offset]; } private String(int offset, int length, char[] val) { this.offset = offset; this.length = length; this.val = val; } public String substring(int from, int to) { return new String(offset + from, to - from, val); } …xxstringx012345678val[]offsetlengthcopy of reference tooriginal char array6The String data type: performanceString data type. Sequence of characters (immutable).Underlying implementation. Immutable char[] array, offset, and length.Memory. 40 + 2N bytes for a virgin String of length N.can use byte[] or char[] instead of String to save space(but lose convenience of String data type)StringStringoperationguaranteeextra spacecharAt()11length()11substring()11concat()NN7The StringBuilder data typeStringBuilder data type. Sequence of characters (mutable).Underlying implementation. Resizing char[] array and length.Remark. StringBuffer data type is similar, but thread safe (and slower).StringStringStringBuilderStringBuilderoperationguaranteeextra spaceguaranteeextra spacecharAt()1111length()1111substring()11NNconcat()NN1 *1 ** amortized8String vs. StringBuilderQ. How to efficiently reverse a string?A.B. public static String reverse(String s) { String rev = ""; for (int i = s.length() - 1; i >= 0; i--) rev += s.charAt(i); return rev; } public static String reverse(String s) { StringBuilder rev = new StringBuilder(); for (int i = s.length() - 1; i >= 0; i--) rev.append(s.charAt(i)); return rev.toString(); }quadratic timelinear time9String challenge: array of suffixesQ. How to efficiently form array of suffixes?aacaagtttacaagc01234567891011121314input string0aacaagtttacaagc1acaagtttacaagc2caagtttacaagc3aagtttacaagc4agtttacaagc5gtttacaagc6tttacaagc7ttacaagc8tacaagc9acaagc10caagc11aagc12agc13gc14csuffixes10String vs. StringBuilderQ. How to efficiently form array of suffixes?A.B. public static String[] suffixes(String s) { int N = s.length(); String[] suffixes = new String[N]; for (int i = 0; i < N; i++) suffixes[i] = s.substring(i, N); return suffixes; } public static String[] suffixes(String s) { int N = s.length(); StringBuilder sb = new StringBuilder(s); String[] suffixes = new String[N]; for (int i = 0; i < N; i++) suffixes[i] = sb.substring(i, N); return suffixes; }linear time andlinear spacequadratic time andquadratic space11Longest common prefixQ. How long to compute length of longest common prefix?Running time. Proportional to length D of longest common prefix.Remark. Also can compute compareTo() in sublinear time. public static int lcp(String s, String t) { int N = Math.min(s.length(), t.length()); for (int i = 0; i < N; i++) if (s.charAt(i) != t.charAt(i)) return i; return N; }prefixprefetch01234567linear time (worst case)sublinear time (typical case)Digital key. Sequence of digits over fixed


View Full Document

Princeton COS 226 - STRINGS

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download STRINGS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STRINGS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STRINGS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?