DOC PREVIEW
Stanford CS 106A - Characters and Strings

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Characters and StringsOnce upon a time . . .Early Character EncodingsThe Victorian InternetSlide 5The Principle of EnumerationEnumerated Types in JavaCharactersThe ASCII Subset of UnicodeNotes on Character RepresentationSpecial CharactersUseful Methods in the Character ClassCharacter ArithmeticExercise: Character ArithmeticStrings as an Abstract IdeaUsing Methods in the String ClassStrings vs. CharactersSelecting Characters from a StringConcatenationExtracting SubstringsChecking Strings for EqualityComparing Characters and StringsSearching in a StringOther Methods in the String ClassSimple String IdiomsExercises: String ProcessingThe reverseString MethodThe EndCharacters and StringsEric RobertsCS 106AFebruary 1, 2010Once upon a time . . .Early Character Encodings•The idea of using codes to represent letters dates from before the time of Herman Hollerith, whose contribution is described in the introduction to Chapter 8.Samuel Morse (1791-1872)•Most of you are probably familiar with the work of Samuel F. B. Morse, inventor of the telegraph, who devised a code consisting of dots and dashes. This scheme made it easier to transmit messages and led the way for later developments.Alphabetic Characters in Morse CodeThe Victorian InternetWhat you probably don’t know is that the invention of the telegraph also gave rise to many of the social phenomena we tend to associate with the modern Internet, including chat rooms, online romances, hackers, and entrepreneurs—all of which are described in Tom Standage’s 1998 book, The Victorian Internet.Characters and StringsThe Principle of Enumeration•Computers tend to be good at working with numeric data. When you declare a variable of type int, for example, the Java virtual machine reserves a location in memory designed to hold an integer in the defined range.•The ability to represent an integer value, however, also makes it easy to work with other data types as long as it is possible to represent those types using integers. For types consisting of a finite set of values, the easiest approach is simply to number the elements of the collection.•For example, if you want to work with data representing months of the year, you can simply assign integer codes to the names of each month, much as we do ourselves. Thus, January is month 1, February is month 2, and so on.•Types that are identified by counting off the elements are called enumerated types.Enumerated Types in Java•Java offers two strategies for representing enumerated types:–Defining named constants to represent the values in the enumeration–Using the enum facility introduced in Java 5.0 •Although I cover the enum syntax briefly in the book, I remain convinced that it is easier for beginning programmers to use the older strategy of defining integer constants to represent the elements of the type and then using variables of type int to store the values.•For example, you can define names for the major compass points as follows:public static final int NORTH = 0;public static final int EAST = 1;public static final int SOUTH = 2;public static final int WEST = 3;Characters•Computers use the principle of enumeration to represent character data inside the memory of the machine. There are, after all, a finite number of characters on the keyboard. If you assign an integer to each character, you can use that integer as a code for the character it represents.•Character codes, however, are not particularly useful unless they are standardized. If different computer manufacturers use different coding sequence (as was indeed the case in the early years), it is harder to share such data across machines.•The first widely adopted character encoding was ASCII (American Standard Code for Information Interchange).•With only 256 possible characters, the ASCII system proved inadequate to represent the many alphabets in use throughout the world. It has therefore been superseded by Unicode, which allows for a much larger number of characters.The ASCII Subset of UnicodeThe Unicode value for any character in the table is the sum of the octal numbers at the beginning of that row and column.The letter A, for example, has the Unicode value 1018, which is the sum of the row and column labels.The following table shows the first 128 characters in the Unicode character set, which are the same as in the older ASCII scheme: \000 \001 \002 \003 \004 \005 \006 \007\b \t \n \011 \f \r \016 \017\020 \021 \022 \023 \024 \025 \026 \027\030 \031 \032 \033 \034 \035 \036 \037space! " # $ % & '( )*+ , - ./0 1 2 3 4 5 6 78 9: ; < = >?@ A B C D E F GH I J K L M N OP Q R S T U V WX Y Z[\]^_` a b c d e f gh i j k l m n op q r s t u v wx y z{|}~ \1770 1 2 3 4 5 6 700x01x02x03x04x05x06x07x10x11x12x13x14x15x16x17x\000 \001 \002 \003 \004 \005 \006 \007\b \t \n \011 \f \r \016 \017\020 \021 \022 \023 \024 \025 \026 \027\030 \031 \032 \033 \034 \035 \036 \037space! " # $ % & '( )*+ , - ./0 1 2 3 4 5 6 78 9: ; < = >?@ A B C D E F GH I J K L M N OP Q R S T U V WX Y Z[\]^_` a b c d e f gh i j k l m n op q r s t u v wx y z{|}~ \177Notes on Character Representation •The first thing to remember about the Unicode table from the previous slide is that you don’t actually have to learn the numeric codes for the characters. The important observation is that a character has a numeric representation, and not what that representation happens to be.•To specify a character in a Java program, you need to use a character constant, which consists of the desired character enclosed in single quotation marks. Thus, the constant 'A' in a program indicates the Unicode representation for an uppercase A. That it has the value 1018 is an irrelevant detail.•Two properties of the Unicode table are worth special notice:–The character codes for the digits are consecutive.–The letters in the alphabet are divided into two ranges, one for the uppercase letters and one for the lowercase letters. Within each range, the Unicode values are consecutive.Special Characters•Most of the characters in the Unicode table are the familiar ones that appear on the keyboard. These characters are called printing characters. The table also includes several special characters that are typically


View Full Document

Stanford CS 106A - Characters and Strings

Download Characters and Strings
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Characters and Strings and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Characters and Strings 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?