MIT 6 454 - A Block-sorting Lossless Data Compression Algorithm - D659848

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 454> A Block-sorting Lossless Data Compression Algorithm

DOC PREVIEW

MIT 6 454 - A Block-sorting Lossless Data Compression Algorithm

School name Massachusetts Institute of Technology

Course 6 454- Graduate Seminar in Area I

Pages 24

This preview shows page 1-2-23-24 out of 24 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

May 10, 1994SRCResearchReport 124A Block-sorting LosslessData Compression AlgorithmM. Burrows and D.J. Wheelerd i g i t a lSystems Research Center130 Lytton AvenuePalo Alto, California 94301Systems Research CenterThe charter of SRC is to advance both the state of knowledge and the state of theart in computer systems. From our establishmentin 1984, we have performed basicand applied research to support Digital’s business objectives. Our current workincludes exploring distributedpersonal computing on multiple platforms, network-ing, programming technology, system modelling and management techniques, andselected applications.Our strategy is to test the technical and practical value of our ideas by buildinghardware and software prototypesandusingthem asdailytools. Interesting systemsare too complex to be evaluated solely in the abstract; extended use allows us toinvestigate their properties in depth. This experience is useful in the short term inrefining our designs, and invaluable in the long term in advancing our knowledge.Most of the major advancesin information systems have come through thisstrategy,including personal computing, distributed systems, and the Internet.We also perform complementary work of a more mathematical flavor. Some ofit is in established fields of theoretical computer science, such as the analysisof algorithms, computational geometry, and logics of programming. Other workexplores new ground motivated by problems that arise in our systems research.We have a strong commitment to communicating our results; exposing and testingour ideas in the research and development communities leads to improved un-derstanding. Our research report series supplements publication in professionaljournals and conferences. We seek users for our prototype systems among thosewith whom we have common interests, and we encourage collaboration with uni-versity researchers.Robert W. Taylor, DirectorA Block-sorting LosslessData Compression AlgorithmM. Burrows and D.J. WheelerMay 10, 1994David Wheeler is a Professor of Computer Science at the University of Cambridge,U.K. His electronic mail address is: [email protected]Digital Equipment Corporation 1994.This work may not be copied or reproduced in whole or in part for any commercialpurpose. Permission to copy in whole or in part without payment of fee is grantedfor nonprofit educational and research purposes provided that all such whole orpartial copies include the following: a notice that such copying is by permissionof the Systems Research Center of Digital Equipment Corporation in Palo Alto,California; an acknowledgment of the authors and individual contributors to thework; and all applicable portions of the copyright notice. Copying, reproducing,or republishingfor any other purpose shall require a license with payment of fee tothe Systems Research Center. All rights reserved.Authors’ abstractWe describe a block-sorting, lossless data compression algorithm, and our imple-mentation of that algorithm. We compare the performance of our implementationwith widely available data compressors running on the same hardware.The algorithm works by applyinga reversibletransformationtoa block of inputtext. The transformation does not itself compress the data, but reorders it to makeit easy to compress with simple algorithms such as move-to-front coding.Ouralgorithm achievesspeed comparable toalgorithmsbased on thetechniquesof Lempel and Ziv, but obtains compression close to the best statistical modellingtechniques. The size of the input block must be large (a few kilobytes) to achievegood compression.Contents1 Introduction 12 The reversible transformation 23 Why the transformed string compresses well 54 An efficient implementation 84.1 Compression : : : : : : : : : : : : : : : : : : : : : : : : : : : : 84.2 Decompression : : : : : : : : : : : : : : : : : : : : : : : : : : : 125 Algorithm variants 136 Performance of implementation 157 Conclusions 161 IntroductionThe most widely used data compression algorithms are based on the sequentialdata compressors of Lempel and Ziv [1, 2]. Statistical modelling techniques mayproduce superior compression [3], but are significantly slower.In this paper, we present a technique that achievescompression within a percentor so of that achieved by statistical modelling techniques, but at speeds comparableto those of algorithms based on Lempel and Ziv’s.Our algorithm does not process its input sequentially, but instead processes ablock of text as a single unit. The idea is to apply a reversible transformation to ablock of text to form a new block that contains the same characters, but is easierto compress by simple compression algorithms. The transformation tends to groupcharacters together so that the probability of finding a character close to anotherinstance of the same character is increased substantially. Text of this kind caneasily be compressed with fast locally-adaptive algorithms, such as move-to-frontcoding [4] in combination with Huffman or arithmetic coding.Briefly, our algorithm transforms a string S of N characters by forming the Nrotations (cyclic shifts) of S, sorting them lexicographically, and extracting the lastcharacter of each of the rotations. A string L is formed from thesecharacters, wherethe ith character of L is the last character of the ith sorted rotation. In addition toL, the algorithm computes the index I of the original string S in the sorted list ofrotations. Surprisingly, there is an efficient algorithm to compute the original stringS given only L and I.The sorting operation brings together rotations with the same initialcharacters.Since the initial characters of the rotations are adjacent to the final characters,consecutive characters in L are adjacent to similar strings in S. If the context of acharacter is a good predictor for the character, L will be easy to compress with asimple locally-adaptive compression algorithm.In the following sections, we describe the transformation in more detail, andshow that it can be inverted. We explain more carefully why this transformationtends to group characters to allow a simple compression algorithm to work moreeffectively. We then describe efficient techniques for implementing the transfor-mation and its inverse, allowing this algorithm to be competitive in speed withLempel-Ziv-based algorithms, but achieving better compression. Finally, we givethe performance of our implementation of this algorithm, and compare it withwell-known

View Full Document