Unformatted text preview:

Optimal Merging Of RunsWeighted External Path LengthSlide 3Other ApplicationsMessage Coding & DecodingExampleSlide 7Slide 8Another ExampleLossless Data CompressionSlide 11Slide 12Huffman TreesGreedy Algorithm For Binary TreesSlide 15Slide 16Slide 17Slide 18Slide 19Data Structure For Tree CollectionHigher Order TreesCause Of FailureHow Many Length 0 Runs To Add?Slide 24ExamplesOptimal Merging Of Runs4 3 6 94 3697 152271322Weighted External Path Length4 3 6 9WEPL(T) = (weight of external node i) * (distance of node i from root of T)WEPL(T) = 4 * 2 + 3*2 + 6*2 + 9*2 = 447 1522Weighted External Path LengthWEPL(T) = (weight of external node i) * (distance of node i from root of T)WEPL(T) = 4 * 3 + 3*3 + 6*2 + 9*1 = 424 36971322Other Applications•Message coding and decoding.•Lossless data compression.Message Coding & Decoding•Messages M0, M1, M2, …, Mn-1 are to be transmitted.•The messages do not change.•Both sender and receiver know the messages.•So, it is adequate to transmit a code that identifies the message (e.g., message index).•Mi is sent with frequency fi.•Select message codes so as to minimize transmission and decoding times.Example•n = 4 messages.•The frequencies are [2, 4, 8, 100].•Use 2-bit codes [00, 01, 10, 11].•Transmission cost = 2*2 + 4*2 + 8*2 + 100*2 = 228.•Decoding is done using a binary tree.Example•Decoding cost = 2*2 + 4*2 + 8*2 + 100*2 = 228 = transmission cost = WEPL2 4 8 1000 110M3M0M1M2Example•Every binary tree with n external nodes defines a code set for n messages.2 481000 110M3M0M1M210•Decoding cost = 2*3 + 4*3 + 8*2 + 100*1 = 134 = transmission cost = WEPLAnother ExampleNo code is a prefix of another!00000011 111 1M0M1M2M3M4M5M8M9M6M7Lossless Data Compression•Alphabet = {a, b, c, d}.•String with 10 as, 5 bs, 100 cs, and 900 ds.•Use a 2-bit code.a = 00, b = 01, c = 10, d = 11.Size of string = 10*2 + 5*2 + 100*2 + 900*2 = 2030 bits.Plus size of code table.Lossless Data Compression•Use a variable length code that satisfies prefix property (no code is a prefix of another).a = 000, b = 001, c = 01, d = 1.Size of string = 10*3 + 5*3 + 100*2 + 900*1 = 1145 bits.Plus size of code table.Compression ratio is approx. 2030/1145 = 1.8.Lossless Data Compression•Decode 0001100101…•addbc…•Compression ratio is maximized when the decode tree has minimum WEPL.0 110da bc10Huffman Trees•Trees that have minimum WEPL.•Binary trees with minimum WEPL may be constructed using a greedy algorithm.•For higher order trees with minimum WEPL, a preprocessing step followed by the greedy algorithm may be used.•Huffman codes: codes defined by minimum WEPL trees.Greedy Algorithm For Binary Trees•Start with a collection of external nodes, each with one of the given weights. Each external node defines a different tree.•Reduce number of trees by 1.Select 2 trees with minimum weight.Combine them by making them children of a new root node.The weight of the new tree is the sum of the weights of the individual trees.Add new tree to tree collection.•Repeat reduce step until only 1 tree remains.Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].92 5 4 7 995 75 7 9Example2•n = 5, w[0:4] = [2, 5, 4, 7, 9].467 9Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].61152 45Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].2 46117 9165Example•n = 5, w[0:4] = [2, 5, 4, 7, 9].2 46117 91627Data Structure For Tree Collection•Operations are:Initialize with n trees.Remove 2 trees with least weight.Insert new tree.•Use a min heap.•Initialize … O(n).•2(n – 1) remove min operations … O(n log n).•n – 1 insert operations … O(n log n).•Total time is O(n log n).•Or, (n – 1) remove mins and (n – 1) change mins.Higher Order Trees•Greedy scheme doesn’t work!•3-way tree with weights [3, 6, 1, 9].9193 6 110Greedy Tree Cost = 293614 919Optimal Tree Cost = 23Cause Of Failure•One node is not a 3-way node.•A 2-way node is like a 3-way node, one of whose children has a weight of 0.3 6 110 919Greedy Tree Cost = 290•Must start with enough runs/weights of length 0 so that all nodes are 3-way nodes.How Many Length 0 Runs To Add?•k-way tree, k > 1.•Initial number of runs is r.•Add least q >= 0 runs of length 0.•Each k-way merge reduces the number of runs by k – 1.•Number of runs after s k-way merges is r + q – s(k – 1)•For some positive integer s, the number of remaining runs must become 1.How Many Length 0 Runs To Add?•So, we want r + q – s(k–1) = 1 for some positive integer s. •So, r + q – 1 = s(k – 1).•Or, (r + q – 1) mod (k – 1) = 0.•Or, r + q – 1 is divisible by k – 1.This implies that q < k – 1.(r – 1) mod (k – 1) = 0 => q = 0.(r – 1) mod (k – 1) != 0 => q = k –1 – (r – 1) mod (k – 1).•Or, q = (1 – r) mod (k – 1).Examples•k = 2.q = (1 – r) mod (k – 1) = (1 – r) mod 1 = 0.So, no runs of length 0 are to be added.•k = 4, r = 6.q = (1 – r) mod (k – 1) = (1 – 6) mod 3 = (–5)mod 3 = (6 – 5) mod 3 = 1.So, must start with 7 runs, and then apply greedy


View Full Document

UF COP 5536 - Optimal Merging Of Runs

Download Optimal Merging Of Runs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optimal Merging Of Runs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optimal Merging Of Runs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?