Optimal Merging Of RunsWeighted External Path LengthSlide 3Other ApplicationsMessage Coding & DecodingExampleSlide 7Slide 8Another ExampleLossless Data CompressionSlide 11Slide 12Huffman TreesGreedy Algorithm For Binary TreesSlide 15Slide 16Slide 17Slide 18Slide 19Data Structure For Tree CollectionHigher Order TreesCause Of FailureHow Many Length 0 Runs To Add?Slide 24ExamplesOptimal Merging Of Runs4 3 6 94 3697 152271322Weighted External Path Length4 3 6 9WEPL(T) = (weight of external node i) * (distance of node i from root of T)WEPL(T) = 4 * 2 + 3*2 + 6*2 + 9*2 = 447 1522Weighted External Path LengthWEPL(T) = (weight of external node i) * (distance of node i from root of T)WEPL(T) = 4 * 3 + 3*3 + 6*2 + 9*1 = 424 36971322Other Applications•Message coding and decoding.•Lossless data compression.Message Coding & Decoding•Messages M0, M1, M2, …, Mn-1 are to be transmitted.•The messages do not change.•Both sender and receiver know the messages.•So, it is adequate to transmit a code that identifies the message (e.g., message index).•Mi is sent with frequency fi.•Select message codes so as to minimize transmission and decoding times.Example•n = 4 messages.•The frequencies are [2, 4, 8, 100].•Use 2-bit codes [00, 01, 10, 11].•Transmission cost = 2*2 + 4*2 + 8*2 + 100*2 = 228.•Decoding is done using a binary tree.Example•Decoding cost = 2*2 + 4*2 + 8*2 + 100*2 = 228 = transmission cost = WEPL2 4 8 1000 110M3M0M1M2Example•Every binary tree with n external nodes defines a code set for n messages.2 481000 110M3M0M1M210•Decoding cost = 2*3 + 4*3 + 8*2 + 100*1 = 134 = transmission cost = WEPLAnother ExampleNo code is a prefix of another!00000011 111 1M0M1M2M3M4M5M8M9M6M7Lossless Data Compression•Alphabet = {a, b, c, d}.•String with 10 as, 5 bs, 100 cs, and 900 ds.•Use a 2-bit code.a = 00, b = 01, c = 10, d = 11.Size of string = 10*2 + 5*2 + 100*2 + 900*2 = 2030 bits.Plus size of code table.Lossless Data Compression•Use a variable length code that satisfies prefix property (no code is a prefix of another).a = 000, b = 001, c = 01, d = 1.Size of string = 10*3 + 5*3 + 100*2 + 900*1 = 1145 bits.Plus size of code table.Compression ratio is approx. 2030/1145 = 1.8.Lossless Data Compression•Decode 0001100101…•addbc…•Compression ratio is maximized when the decode tree has minimum WEPL.0 110da bc10Huffman Trees•Trees that have minimum WEPL.•Binary trees with minimum WEPL may be constructed using a greedy algorithm.•For higher order trees with minimum WEPL, a preprocessing step followed by the greedy algorithm may be used.•Huffman codes: codes defined by minimum WEPL trees.Greedy Algorithm For Binary Trees•Start with a collection of external nodes, each with one of the given weights. Each external node defines a different tree.•Reduce number of trees by 1.Select 2 trees with minimum weight.Combine them by making them children of a new root node.The weight of the new tree is the sum of the weights of the individual trees.Add new tree to tree collection.•Repeat reduce step until only 1 tree remains.Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].92 5 4 7 995 75 7 9Example2•n = 5, w[0:4] = [2, 5, 4, 7, 9].467 9Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].61152 45Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].2 46117 9165Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].2 46117 91627Data Structure For Tree Collection•Operations are:Initialize with n trees.Remove 2 trees with least weight.Insert new tree.•Use a min heap.•Initialize … O(n).•2(n – 1) remove min operations … O(n log n).•n – 1 insert operations … O(n log n).•Total time is O(n log n).•Or, (n – 1) remove mins and (n – 1) change mins.Higher Order Trees•Greedy scheme doesn’t work!•3-way tree with weights [3, 6, 1, 9].9193 6 110Greedy Tree Cost = 293614 919Optimal Tree Cost = 23Cause Of Failure•One node is not a 3-way node.•A 2-way node is like a 3-way node, one of whose children has a weight of 0.3 6 110 919Greedy Tree Cost = 290•Must start with enough runs/weights of length 0 so that all nodes are 3-way nodes.How Many Length 0 Runs To Add?•k-way tree, k > 1.•Initial number of runs is r.•Add least q >= 0 runs of length 0.•Each k-way merge reduces the number of runs by k – 1.•Number of runs after s k-way merges is r + q – s(k – 1)•For some positive integer s, the number of remaining runs must become 1.How Many Length 0 Runs To Add?•So, we want r + q – s(k–1) = 1 for some positive integer s. •So, r + q – 1 = s(k – 1).•Or, (r + q – 1) mod (k – 1) = 0.•Or, r + q – 1 is divisible by k – 1.This implies that q < k – 1.(r – 1) mod (k – 1) = 0 => q = 0.(r – 1) mod (k – 1) != 0 => q = k –1 – (r – 1) mod (k – 1).•Or, q = (1 – r) mod (k – 1).Examples•k = 2.q = (1 – r) mod (k – 1) = (1 – r) mod 1 = 0.So, no runs of length 0 are to be added.•k = 4, r = 6.q = (1 – r) mod (k – 1) = (1 – 6) mod 3 = (–5)mod 3 = (6 – 5) mod 3 = 1.So, must start with 7 runs, and then apply greedy
View Full Document