Princeton ELE 572 - Subword Permutation Instructions for Two-Dimensional Multimedia

Unformatted text preview:

Subword Permutation Instructions for Two-Dimensional MultimediaProcessing in MicroSIMD ArchitecturesRuby B. LeePrinceton [email protected] architectures incorporating subword parallelism are very efficient forapplication-specific media processors as well as for fast multimedia information processing ingeneral-purpose processors. This paper addresses the unsolved problem of the need topermute the subwords packed in registers for maximum parallelism performance, especiallyfor two-dimensional (2-D) multimedia algorithms. We propose a new systematic approach foridentifying the fundamental data rearrangement needs in current and future 2-D pixelprocessing programs based on the hierarchical decomposition of frames and objects intoatomic 2-D structures. We define new subword permutation instructions, Check, Excheck,Exchange, and Permset, that achieve these data rearrangements across multiple registers. Wealso define an alphabet of subword permutation primitives, including these new instructionsand the Mix instruction defined for PA-RISC MAX-2 and IA-64, which supports the datarearrangement needs of 2-D frames and objects. We show the sufficiency and efficiency of thisalphabet for achieving all possible permutations of hierarchical 2-D blocks.1. IntroductionMultimedia information processing can be considered an increasing part of the general-purpose workload or a special-purpose application area. In this paper, we consider newinstructions for accelerating multimedia processing in any programmable processor, whethergeneral-purpose or application-specific. The focus is on simple, single-cycle instructions,which can be used to construct any type of permutations needed in two-dimensional (2-D)multimedia processing.Multimedia extensions have been added to general-purpose processors to accelerate theprocessing of different media types [1-7,15,16]. The types of application-specific processorswe target in this paper are designed to execute various multimedia programs, rather than justone. They include digital signal processors [8], video signal processors [9, 10], andmediaprocessors [11, 12].Subword parallelism [1,4] is now widely deployed by multimedia instructions inmicroprocessor architectures [1-7, 15,16] and in media processors [12] to accelerate theprocessing of lower-precision data, like 16-bit audio samples or 8-bit pixel components. Wealso call this microSIMD architecture [13], since it applies SIMD (Single Instruction MultipleData) parallel processor techniques [14] within a single processor. A subword-parallel (ormicroSIMD) instruction performs the same operation in parallel on multiple pairs of subwordspacked into two registers, which are typically 32 to 128 bits wide in today’s microprocessorsand mediaprocessors (see figure 1). For example, a 64-bit word-oriented datapath can be0-7695-0716-6/00 $10.00 ã 2000 IEEEpartitioned into eight 8-bit subwords, or four 16-bit subwords, or two 32-bit subwords.Substantial performance improvements have been realized using subword parallel instructions,at very low cost compared to other forms of parallelism, like superscalar, VLIW or parallelprocessor organizations, for the same degree of operation parallelism [13].SubwordArith.UnitRegisterFileShift/PermuteUnitMultiple Subword Arithmetic and Shift/Permutefunctional units can be implementedFigure 1: microSIMD subword parallelism leveraging word datapathsWith packed subwords in registers, we now need to be able to re-arrange subwords within aregister, and between registers. This is necessary to achieve the maximum parallelism forsubsequent processing. Unfortunately, subword permutation operations are not understood asclearly as subword arithmetic operations. They require moving several fields (subwords) inparallel. Conventional shift and rotate instructions move all the bits in a register by the sameamount. Extract and deposit instructions, found in instruction-set architectures like PA-RISC[17], move one field using one or two instructions. Early subword permutation instructionslike mix and permute [4] in the PA-RISC MAX-2 multimedia instructions are a first attemptto find efficient and general-purpose subword permutation primitives. However, thesufficiency or efficiency of these permutation primitives in achieving any arbitrary permutationhas not been demonstrated. The problem is further complicated by the fact that image, videoor graphics processing require mapping two-dimensional objects onto subwords in multipleregisters, and then permuting these subwords between registers. In addition, sincepermutations have not been easily achieved by programmable processors, algorithm designersmay not have optimized algorithms using permutations. Hence, one cannot just comb throughall the common multimedia algorithms to determine what permutations are used and theperformance impact they have: one would often need to re-think algorithms to see if efficientpermutations would help improve the performance. Furthermore, in designing subwordpermutation primitives, we need to project the permutation needs of future, yet-to-be definedmultimedia algorithms, and this seems to be an intractable problem. In this paper, we proposea systematic solution to this unsolved problem of finding generic subword permutationprimitives for both current and future algorithms for processing two-dimensional multimediadata. We also define a small set of subword permutation primitives, and show that this is botha sufficient and an efficient set.In section 2, we describe how 2-dimensional frames can be mapped into the packedsubwords of microSIMD architectures. We also show that two-dimensional objects can bedecomposed into smaller blocks or polygons, and ultimately into atomic 2x2 matrices andtriangles. In section 3, we review the subword permutation instructions that have been definedin the multimedia instructions MAX-2 for PA-RISC processors [4] and for IA-64 EPICprocessors [15], especially the mix instruction. We show an example of how a permutation ona 2-D object can be decomposed into hierarchical permutations on 2x2 matrices. In section 4,we investigate the subword permutation needs of atomic 2-D structures, and postulate that0-7695-0716-6/00 $10.00 ã 2000 IEEEthese are generic primitives since all 2-D frames and objects can be decomposed into theseatomic 2-D structures. In section 5, we propose small subsets of subword permutationprimitives that are sufficient and also efficient for different


View Full Document
Download Subword Permutation Instructions for Two-Dimensional Multimedia
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Subword Permutation Instructions for Two-Dimensional Multimedia and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Subword Permutation Instructions for Two-Dimensional Multimedia 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?