Coding and Applications in Sensor Networks Jie Gao Computer Science Department Stony Brook University Paper Dimakis05 A G Dimakis V Prabhakaran and K Ramchandran Ubiquitous Access to Distributed Data in Large Scale Sensor Networks through Decentralized Erasure Codes Symposium on Information Processing in Sensor Networks IPSN 05 April 2005 Why coding Information compression Robustness to errors error correction codes Source coding Compression What is the minimum number of bits to represent certain information What is a measure of information Entropy Information theory Channel coding Achieve fault tolerance Transmit information through a noisy channel Storage on a disk Certain bits may be flipped Goal recover the original information How duplicate information Source coding and Channel coding Source coding and channel coding can be separated without hurting the performance 01100011 Source Coding 0110 Channel 01100 Coding 01100011 11100 0110 Decompress Decode Noisy Channel Coding in sensor networks Compression Sensors generate too much data Nearby sensor readings are correlated Fault tolerance Communication failures Corrupted messages by a noisy channel Node failures fault tolerance storage Adversary inject false information Channels The media through which information is passed from a sender to a receiver Binary symmetric channel each symbol is flipped with probability p Erasure channel each symbol is replaced by a with probability p We first focus on binary symmetric channel Encoding and decoding Encoding Input a string of length k data Output a string of length n k codeword Decoding Input some string of length n might be corrupted Output the original data of length k Error detection and correction Error detection detect whether a string is a valid codeword Error correction correct it to a valid codeword Maximum likelihood Decoding find the codeword that is closest in Hamming distance I e with minimum flips How to find it For small size code store a codebook Do table lookup NP hard in general Scheme 1 repetition Simplest coding scheme one can come up with Input data 0110010 Repeat each bit 11 times Now we have 00000000000111111111111111111111100000000 000000000000001111111111100000000000 Decoding do majority vote Detection when the 10 bits don t agree with each other Correction 5 bits of error Scheme 2 Parity check Add one bit to do parity check Sum up the number of 1 s in the string If it is even then set the parity check bit to 0 otherwise set the parity check bit to 1 Eg 001011010 111011111 Sum of 1 s in the codeword is even 1 bit parity check can detect 1 bit error If one bit is flipped then the sum of 1s is odd But can not detect 2 bits error nor can correct 1 bit error More on parity check Encode a piece of data into codeword Not every string is a codeword After 1 bit parity check only strings with even 1s are valid codeword Thus we can detect error Minimum Hamming distance between any two codewords is 2 Suppose we make the min Hamming distance larger then we can detect more errors and also correct errors Scheme 3 Hamming code Intuition generalize the parity bit and organize them in a nice way so that we can detect and correct more errors Lower bound If the minimum Hamming distance between two code words is k then we can detect at most k 1 bits error and correct at most k 2 bits error Hamming code 7 4 adds three additional check bits to every four data bits of the message to correct any single bit error and detect all two bit errors Hamming code 7 4 Coding multiply the data with the encoding matrix Decoding multiply the codeword with the decoding matrix An example encoding Input data Codeword Systematic code the first k bits is the data Original data is preserved An example decoding Decode Now suppose there is an error at the ith bit We received Now decode This picks up the ith column of the decoding vector An example decoding Suppose Second bit is wrong Decode Data more than 4 bits Break it into chunks and encode each chunk Linear code Most common category Succinct specification efficient encoding and errordetecting algorithms simply matrix multiplication Code space a linear space with dimension k By linear algebra we find a set of basis Code space Generator matrix Linear code Null space of dimension n k Parity check matrix Error detection check Hamming code is a linear code on alphabet 0 1 It corrects 1 bit and detects 2 bits error Linear code A linear code is called systematic if the first k bits is the data Generation matrix G Ik k Pk n k If n 2k and P is invertible then the code is called invertible m Pm A message m maps to Parity bits can be used to recover m Parity bits Detect more errors Bursty errors Reed Solomon codes Most commonly used code in CDs DVDs Handles bursty errors Use a large alphabet and algebra Take an alphabet of size q n and n distinct elements Input message of length k Define the polynomial The codeword is Reed Solomon codes Rephrase the encoding scheme Unknowns variables the message of length k What we know some equations on the unknowns Each of the coded bit gives a linear equation on the k unknowns A linear system How many equations do we need to solve it We only need length k coded information to solve all the unknowns Reed Solomon codes Write the linear system by matrix form 1 1 1 2 1 k 12 1k 1 c0 C 1 22 2k 1 c1 C 2 k2 kk 1 ck 1 C k This is the Van de Ment matrix So it s invertible This code can tolerate n k errors Any k bits can recover the original message This property is called erasure code Use coding for fault tolerance If a sensor die we lose the data For fault tolerance we have to duplicate data s t we can recover the data from other sensors Straight forward solution duplicate it at other places Storage size goes up Use coding to keep storage size as the same What we pay decoding cost Problem setup Setup we have k data nodes and n k storage nodes data nodes may also be storage nodes Each data node generates one piece of data Each storage node only stores one piece of coded data We want to recover data by using any k storage nodes Sounds familiar Reed Solomon code But it is centralized we need all the k inputs to generate the coded information Distributed random linear code Each node sends its data to m O lnk random storage nodes A storage node may receive multiple pieces of data c1 c2 ck but it stores a random combination of them E g a1c1 a2c2 akck where a s are random coefficients Coding and decoding Storage size keeps almost the same
View Full Document
Unlocking...