Berkeley COMPSCI 70 - n8 - D39632

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 70> n8

DOC PREVIEW

Berkeley COMPSCI 70 - n8

School name University of California, Berkeley

Course Compsci 70- Discrete Mathematics and Probability Theory

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

EECS 70 Discrete Mathematics and Probability TheorySpring 2014 Anant Sahai Note 8Error Correcting CodesWe will consider two situations in which we wish to convey information on an unreliable channel. The first isexemplified by the internet, where the information (say a file) is broken up into packets, and the unreliabilityis manifest in the fact that some of the packets are lost (or erased) during transmission. Moreover the packetsare labeled with headers so that the recipient knows exactly which packets were received and which weredropped. We will refer to such errors as erasure errors. See the figure below:In the second situation, some of the packets are corrupted during transmission due to channel noise. Nowthe recipient has no idea which packets were corrupted and which were received unmodified:In the above example, packets 1 and 4 are corrupted. These types of errors are called general errors. Wewill discuss methods of encoding messages, called error correcting codes, which are capable of correctingboth erasure and general errors. The principle is to embed a message into a codeword (a process calledencoding), transmit the codeword, and then recover the message from the damaged/corrupted symbols thatare received (a process called decoding).This area of study is a part of “Information Theory,” one of the core computer sciences1along with the theoryof computation, control theory, communication theory, and estimation/learning theory. At the NationalScience Foundation, these are (largely) currently in the Computer and Information Science and Engineering(CISE) directorate in the Division of Computing and Communication Foundations (CCF). In particular,there is a rich area of coding theory and here in this note, we will simply touch a little part of it. If you wantto learn more, this material is built upon in 121, 229b, 229a, and other courses.On the practical side, this material is intensely useful. Every time you use your cellphone, satellite TV, DSL,cable-modem, hard-disk drive, solid-state drive, CD-ROM, DVD, Blu-ray, etc., error correcting codes arecrucial. Related ideas are being deployed in the Internet for streaming and in modern data-centers to makecloud computing and distributed storage possible. Perhaps most surprisingly, the actual (polynomial-based)codes that you are studying in this lecture notes are called Reed-Solomon2codes and are essentially what areused in many applications. These are representative of what are called algebraic-geometry codes, because1This is also reflected in an interesting history. The IEEE was formed by the merger of two societies — the Institute of RadioEngineers and the American Institute of Electrical Engineers. The Radio Engineers were intimately involved with communication,cryptography, radar, etc. The IRE and AIEE merged to form the IEEE in 1963, but the IRE was already bigger than the AIEE by1957. This merger was natural because electronics was becoming an important implementation substrate for both sets of engineersand the modern theory of electrical systems used the same mathematics as radio and signal processing (as well as control). This iswhy the word “EE” is often used to describe these computer sciences. At Berkeley, of course, this is all just a historical footnotesince you simply experience EECS as a single entity, but the history remains in some of the course numbers.2Like RSA, these carry the names of their inventors Reed and Solomon. The modern use of these codes is largely related to whatare called BCH codes, named after Bose, Chaudhuri, and Hocquenghem. We cannot get into these differences but the WikipediaEECS 70, Spring 2014, Note 8 1of their connections to algebraic geometry — the branch of mathematics that studies the roots of polynomialequations.Returning to the problem at hand. Assume that the information consists of n packets3. We can assumewithout loss of generality that the contents of each packet is a number modulo q (denoted by GF(q)), whereq is a prime. For example, the contents of the packet might be a 32-bit string and can therefore be regardedas a number between 0 and 232− 1; then we could choose q to be any prime4larger than 232. The propertiesof polynomials over GF(q) (i.e., with coefficients and values reduced modulo q) are the backbone of botherror-correcting schemes. To see this, let us denote the message to be sent by m1, . . . , mnand make thefollowing crucial observations:1) There is a unique polynomial P(x) of degree n − 1 such that P(i) = mifor 1 ≤ i ≤ n (i.e., P(x) containsall of the information about the message, and evaluating P(i) gives the intended contents of the i-th packet).2) The message to be sent is now m1= P(1), . . . , mn= P(n). We can generate additional packets by evaluat-ing P(x) at additional points n + 1, n + 2, . . . , n + j (remember, our transmitted codeword must be redundant,i.e., it must contain more packets than the original message to account for the lost or corrupted packets).Thus the transmitted codeword is c1= P(1), c2= P(2), . . . , cn+ j= P(n + j). (Alternatively, we can use then numbers defining the message as the coefficients of a degree n − 1 polynomial directly. Either way, thetransmitted codeword will be evaluations of this polynomial.) Since we are working modulo q, we mustmake sure that n + j ≤ q, but this condition does not impose a serious constraint since q is presumed to bevery large.Erasure ErrorsHere we consider the setting of packets being sent over the internet. In this setting, the packets are la-beled and so the recipient knows exactly which packets were dropped during transmission. One additionalobservation will be useful:3) By Property 2 in Note 7, we can uniquely reconstruct P(x) from its values at any n distinct points, sinceit has degree n − 1. This means that P(x) can be reconstructed from any n of the transmitted packets.Evaluating this reconstructed polynomial P(x) at x = 1, . . . , n yields the original message m1, . . . , mn. (Oralternatively, if the message was encoded into the polynomial’s coefficients, we could just read the messageout by simplifying the polynomial.)Recall that in our scheme, the transmitted codeword is c1= P(1), c2= P(2), . . . , cn+ j= P(n + j). Thus, ifwe hope to be able to correct k errors, we simply need to set j = k. The encoded codeword will then consistof n + k packets.ExampleSuppose Alice wants to send Bob a message of n = 4 packets and she wants to guard against k = 2 lostpackets.

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Berkeley COMPSCI 70 - n8

Sign up for free to view:

Please select your school