CS 536 ParkDigital vs. Analog DataDigital data: bits.−→ discrete signal−→ both in time and amplitudeAnalog data: audio/voice, video/image−→ continuous signal−→ both in time and amplitudeBoth forms used in today’s network environment.−→ burning CDs−→ audio/video playbackIn broadband networks:−→ use analog data to carry digital dataCS 536 ParkImportant task: analog data is often digitized−→ useful: why?−→ basics: digital signal processingHow to digitize such that digital representation is faithful?−→ sampling−→ interface between analog & digital worldCS 536 ParkIntuition behind sampling:−→ slowly vs. rapidly varying signalttTT12If a signal varies quickly, need more samples to not missdetails/changes.ν1=1/T1<ν2=1/T2CS 536 ParkAre regularly spaced samples of fixed interval the best?−→ perhaps “savings” possible−→ the more samples, the more bitsApplication: network probing−→ goal: ascertain network state−→ send sequence of probe packets−→ from arriving probes infer/estimate congestionBut, do not want to disturb Schr¨odinger’s cat ...−→ networking: minimize overhead−→ irregular probing−→ assurance: probabilisticCS 536 ParkSampling criterion for guaranteed faithfulness:Sampling Theorem (Nyquist): Given continuousbandlimited signal s(t) with S(ω) = 0 for |ω| >W, s(t)can be reconstructed from its samples ifν>2Wwhere ν is the sampling rate.−→ ν: samples per secondIssue of digitizing amplitude/magnitude ignored−→ problem of quantization−→ possible source of information loss−→ exploit limitations of human perception−→ logarithmic scaleCS 536 ParkCompressionInformation transmission over noiseless medium−→ medium or “channel”Sender wants to communicate information to receiver overnoiseless channel.−→ receive exactly what is sent−→ idealized scenarioCS 536 ParkSet-up:−→ take a system perspective−→ e.g., modem manufacturerNeed to specify two parts: property of source and howcompression is done.Part I. What does the source look like:• source s emits symbols from finite alphabet set Σ→ e.g., Σ = {0, 1}; Σ = ASCII character set• symbol a ∈ Σ is generated with probability pa> 0→ e.g., books have known distribution for ‘e’, ‘x’ ...→ let’s play “Wheel of Fortune”CS 536 ParkPart II. Compression machinery:• code book F assigns code word wa= F (a) for eachsymbol a ∈ Σ→ wais a binary string of length |wa|→ F could be just a table• F is invertible→ receiver d can recover a from wa→ F−1is the same table, different look-upCS 536 ParkEx.: Σ = {A, C, G, T }; need at least two bits• F1: wA= 00, wC= 01, wG= 10, wT=11• F2: wA=0,wC= 10, wG= 110, wT= 1110−→ pros & cons?Note: code book F is not unique−→ find a “good” code book−→ when is a code book good?CS 536 ParkPerformance (i.e., “goodness”) measure: average codelength LL =Xa∈Σpa|wa|−→ average number of bits consumed by given FEx.: If DNA sequence is 10000 letters long, then requireon average 10000 · L bits to be transmitted.−→ good to have code book with small LOptimization problem: Given source hΣ, pi where p is aprobability vector, find a code book F with least L.CS 536 ParkA fundamental result on what is achievable to attain smallL.Entropy H of source hΣ, pi is defined asH =Xa∈Σpalog1paEx.: Σ = {A, C, G, T }; H is maximum if pA= pC=pG= pT=1/4.−→ when is it minimum?Source Coding Theorem (Shannon): For all F ,H ≤ L.Moreover, L can be made to approach H.CS 536 ParkRemark:• To approach minimum H use blocks of k symbols→ extension code• entropy is innate property of source s• Ensemble limitation→ e.g., sending number π =3.1415927 ...→ better
View Full Document