DOC PREVIEW
Stanford CS 106A - 31-characters-and-strings

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Eric Roberts Handout #31CS 106A February 1, 2010Characters and StringsEarly Character Encodings• The idea of using codes to represent letters dates from beforethe time of Herman Hollerith, whose contribution is describedin the introduction to Chapter 8.Samuel Morse (1791-1872)• Most of you are probably familiar with the work of Samuel F.B. Morse, inventor of the telegraph, who devised a codeconsisting of dots and dashes. This scheme made it easier totransmit messages and led the way for later developments.Alphabetic Characters in Morse CodeThe Victorian InternetWhat you probably don’t know isthat the invention of the telegraphalso gave rise to many of the socialphenomena we tend to associate withthe modern Internet, including chatrooms, online romances, hackers,and entrepreneurs—all of which aredescribed in Tom Standage’s 1998book, The Victorian Internet.The Principle of Enumeration• Computers tend to be good at working with numeric data.When you declare a variable of type int, for example, theJava virtual machine reserves a location in memory designedto hold an integer in the defined range.• The ability to represent an integer value, however, also makesit easy to work with other data types as long as it is possibleto represent those types using integers. For types consistingof a finite set of values, the easiest approach is simply tonumber the elements of the collection.• For example, if you want to work with data representingmonths of the year, you can simply assign integer codes to thenames of each month, much as we do ourselves. Thus,January is month 1, February is month 2, and so on.• Types that are identified by counting off the elements arecalled enumerated types.Enumerated Types in Java• Java offers two strategies for representing enumerated types:– Defining named constants to represent the values in the enumeration– Using the enum facility introduced in Java 5.0• Although I cover the enum syntax briefly in the book, I remainconvinced that it is easier for beginning programmers to usethe older strategy of defining integer constants to representthe elements of the type and then using variables of type intto store the values.• For example, you can define names for the major compasspoints as follows:public static final int NORTH = 0;public static final int EAST = 1;public static final int SOUTH = 2;public static final int WEST = 3;Characters• Computers use the principle of enumeration to representcharacter data inside the memory of the machine. There are,after all, a finite number of characters on the keyboard. If youassign an integer to each character, you can use that integer asa code for the character it represents.• Character codes, however, are not particularly useful unlessthey are standardized. If different computer manufacturersuse different coding sequence (as was indeed the case in theearly years), it is harder to share such data across machines.• The first widely adopted character encoding was ASCII(American Standard Code for Information Interchange).• With only 256 possible characters, the ASCII system provedinadequate to represent the many alphabets in use throughoutthe world. It has therefore been superseded by Unicode,which allows for a much larger number of characters.The ASCII Subset of UnicodeThe following table shows the first 128 characters in the Unicodecharacter set, which are the same as in the older ASCII scheme:\000 \001 \002 \003 \004 \005 \006 \007\b \t \n \011 \f \r \016 \017\020 \021 \022 \023 \024 \025 \026 \027\030 \031 \032 \033 \034 \035 \036 \037space! " # $ % & '( )*+ , - ./0 1 2 3 4 5 6 78 9: ; < = >?@ A B C D E F GH I J K L M N OP Q R S T U V WX Y Z[\]^_` a b c d e f gh i j k l m n op q r s t u v wx y z{|}~ \1770123456700x01x02x03x04x05x06x07x10x11x12x13x14x15x16x17x\000 \001 \002 \003 \004 \005 \006 \007\b \t \n \011 \f \r \016 \017\020 \021 \022 \023 \024 \025 \026 \027\030 \031 \032 \033 \034 \035 \036 \037space! " # $ % & '( )*+ , - ./0 1 2 3 4 5 6 78 9: ; < = >?@ A B C D E F GH I J K L M N OP Q R S T U V WX Y Z[\]^_` a b c d e f gh i j k l m n op q r s t u v wx y z{|}~ \177– 2 –Notes on Character Representation• The first thing to remember about the Unicode table from theprevious slide is that you don’t actually have to learn thenumeric codes for the characters. The important observationis that a character has a numeric representation, and not whatthat representation happens to be.• To specify a character in a Java program, you need to use acharacter constant, which consists of the desired characterenclosed in single quotation marks. Thus, the constant 'A' ina program indicates the Unicode representation for anuppercase A. That it has the value 1018 is an irrelevant detail.• Two properties of the Unicode table are worth special notice:– The character codes for the digits are consecutive.– The letters in the alphabet are divided into two ranges, one forthe uppercase letters and one for the lowercase letters. Withineach range, the Unicode values are consecutive.Special Characters• Most of the characters in the Unicode table are the familiarones that appear on the keyboard. These characters are calledprinting characters. The table also includes several specialcharacters that are typically used to control formatting.• Special characters are indicated in the Unicode table by anescape sequence, which consists of a backslash followed by acharacter of sequence of digits. The most common ones are:\b Backspace\fForm feed (starts a new page)\nNewline (moves to the next line)\rReturn (moves to the beginning of the current line without advancing) \tTab (moves horizontally to the next tab stop)\\The backspace character itself\' The character ' (required only in character constants)\" The character " (required only in string constants)\ddd The character whose Unicode value is the octal number dddUseful Methods in the Character Classstatic boolean isDigit(char ch)Determines if the specified character is a digit.static boolean isLetter(char ch)Determines if the specified character is a letter.static boolean isLetterOrDigit(char ch)Determines if the specified character is a letter or a digit.static boolean isLowerCase(char ch)Determines if the specified character is a lowercase letter.static boolean isUpperCase(char ch)Determines if the specified character is an uppercase letter.static boolean isWhitespace(char ch)Determines if the specified character is whitespace (spaces and


View Full Document

Stanford CS 106A - 31-characters-and-strings

Download 31-characters-and-strings
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 31-characters-and-strings and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 31-characters-and-strings 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?