MIT 6 050J - Lecture Notes - D2802716

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 050J> Lecture Notes

MIT 6 050J - Lecture Notes

School name Massachusetts Institute of Technology

Course 6 050j- Information and Entropy

Pages 20

Download Save

Unformatted text preview:

2 Codes2.1 Symbol Space Size2.2 Use of Spare Capacity2.2.1 Binary Coded Decimal (BCD)2.2.2 Genetic Code2.2.3 Telephone Area Codes2.2.4 IP Addresses2.2.5 ASCII2.3 Extension of Codes2.4 Fixed-Length and Variable-Length Codes2.4.1 Morse Code2.5 Detail: ASCII2.6 Detail: Integer Codes2.7 Detail: The Genetic Code*2.8 Detail: IP Addresses2.9 Detail: Morse CodeChapter 2CodesIn the previous chapter we examined the fundamental unit of information, the bit, and its various abstractrepresentations: the mathematical bit, the control bit, the classical bit, and the quantum bit.A single bit is useful if exactly two answers to a question are possible. Examples include the result of acoin toss (heads or tails), the gender of a person (male or female), the verdict of a jury (guilty or not guilty),and the truth of an assertion (true or false). Most situations in life are more complicated. This chapterconcerns ways in which complex objects can be represented not by a single bit, but by arrays of bits.It is convenient to focus on a very simple model of a s ystem, shown in Figure 2.1, in which the input isone of a predetermined set of objects, or “symbols,” the identity of the particular symb ol chosen is encodedin an array of bits, these bits are transmitted through space or time, and then are decoded at a later timeor in a different place to determine which symbol was originally chosen. In later chapters we will build onthis model to deal with issues of robustness and efficiency.CoderChannelDecoder- - - -(Arrays of Bits) (Arrays of Bits)Input(Symb ols)Output(Symb ols)Figure 2.1: Simple model of a communication systemIn this chapter we will look into several aspects of the design of codes, and show s ome examples in whichthese aspects were either done well or not so well. Individual sections will describe co des that illustrate theimportant points. Some objects for which codes may be needed include:• Letters: BCD, EBCDIC, ASCII, Unicode, Morse Code• Integers: Binary, Gray, 2’s complement• Numbers: Floating-PointAuthor: Paul Penfield, Jr.This document: http://www-mtl.mit.edu/Courses/6.050/2005/notes/chapter2.pdfVersion 1.2, January 28, 2005. Copyrightc 2005 Massachusetts Institute of TechnologyStart of notes · back · next | 6.050J/2.110J home page | Site map | Search | About this document | Comments and inquiries82.1 Symbol Space Size 9• Proteins: Genetic Code• Telephones: NANP, International co des• Hosts: Ethernet, IP Addresses, Domain names• Images: TIFF, GIF, and JPEG• Audio: MP3• Video: MPEG2.1 Symbol Space SizeDifferent considerations are important depending on the number of symbols that need to be encoded.The number of symbols in a code is c alled the symbol space size. (As a special case, if there is only onesymbol, no bits are required to specify it.) We will consider symbol spaces of different sizes:• 2• Integral power of 2• Finite• Infinite, Countable• Infinite, UncountableIf the number of symbols is 2, then the se lection can be encoded in a single bit. If the number of possiblesymbols is 4, 8, 16, 32, 64, or another integral power of 2, then the selection may be coded in the numberof bits equal to the logarithm, base 2, of the symbol space size. Thus 2 bits can designate the suit (clubs,diamonds, hearts, or spades) of a playing card, and 5 bits can encode the selection of one student in a class of32. As a s pecial case, if there is only one symbol, no bits are required to sp ecify it. A dreidel is a four-sidedtoy marked with Hebrew letters, and spun like a top in a children’s game, especially at Hanukkah. Theresult of each spin could b e encoded in 2 bits.If the number of symbols is finite but not an integral power of 2, then the number of bits that would workfor the next higher integral power of 2 can be used to encode the selection, but there will be some unusedbit patterns. Examples include the 10 digits, the six faces of a cubic die, the 13 denominations of a playingcard, and the 26 letters of the English alphabet. In each case, there is spare capacity (6 unused patterns inthe 4-bit representation of digits, 2 unused patterns in the 3-bit represe ntation of a die, etc.) What to dowith this spare capacity is an important design issue that will be treated in the next section.If the number of symbols is infinite but countable (able to be put into a one-to-one relation with theintegers) then a bit string of a given length can only denote a finite number of items from this infinite set.Thus, a 4-bit code for non-negative integers might designate integers from 0 through 15, but would not beable to handle integers outside this range. If, as a result of some computation, it were necessary to representlarger numbers, then this “overflow” condition would have to be handled in some way.If the numbe r of symbols is infinite and uncountable (such as the value of a physical quantity like voltageor acoustic pressure) then some technique of “discretization” must be used to replace possible values by afinite number of se lecte d values that are approximately the same. For example, if the numbers between 0and 1 were the symbols and if 2 bits were available for the coded representation, one approach might beto approximate all numbers between 0 and 0.25 by the number 0.125, all numbers between 0.25 and 0.5 by0.375, and so on. Whether such an approximation is adequate depends on how the decoded data is used.The approximation is not reversible, in that there is no decoder which will recover the original symbol givenjust the code for the approximate value. However, if the number of bits available is large enough, then formany purposes a decoder could provide a number that is close enough. Floating-point representation of realnumbe rs in computers is based on this philosophy.2.2 Use of Spare Capacity 102.2 Use of Spare CapacityIn many situations there are some unused code patterns, because the number of symbols is not an integralpower of 2. There are several strategies possible to deal with this.• Ignore• Map to other values• Reserve for future expansion• Add control codes• Use for common abbreviationsThese approaches will be illustrated with examples of common codes.2.2.1 Binary Coded Decimal (BCD)A common way to represent the digits 0 - 9 is by the ten four-bit patterns shown in Table 2.1. Thereare six bit patterns that are not used, and the question is what to do with them. Here are a few ideas thatcome to mind.First, the unused bit patterns

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 050J - Lecture Notes

Sign up for free to view:

Please select your school