Princeton ELE 572 - PLX: A FULLY SUBWORD-PARALLEL INSTRUCTION

Unformatted text preview:

PLX A FULLY SUBWORD PARALLEL INSTRUCTION SET ARCHITECTURE FOR FAST SCALABLE MULTIMEDIA PROCESSING Ruby B Lee and A Murat Fiskiran Princeton Architecture Laboratory for Multimedia and Security PALMS Princeton University rblee fiskiran princeton edu ABSTRACT PLX is a small fully subword parallel instruction set architecture designed for very fast multimedia processing especially in constrained environments requiring low cost and power such as handheld multimedia information appliances In PLX we select the most useful multimedia instructions added previously to microprocessors We also introduce a few novel features a new definition of predication requiring very few bits in each predicated instruction and datapath scalability from 32 bit to 128 bit words which allows different degrees of subword parallelism without any changes to the ISA Performance results from basic multimedia kernels testify to PLX s superiority for multimedia processing 1 INTRODUCTION Multimedia processing involves compute intensive operations and constitutes an increasingly greater fraction of the general purpose processor s workload 1 To achieve better multimedia performance instruction set architectures ISAs have added multimedia extensions 2 3 such as MAX 2 4 to PA RISC processors 5 MMX 6 to IA 32 processors and a superset of these to IA 64 7 processors These ISAs exploit the following two properties of multimedia applications Huge amounts of data parallelism Extensive use of low precision data These two properties are exploited well by the use of subword parallelism also called microSIMD parallelism 2 8 In a subword parallel architecture the processor s datapath is partitioned into multiple lower precision segments called the subwords and the instructions operate in parallel on these subwords Figure 1 PALMS research is supported in part by HP NSF and Kodak PLX is a fully subword parallel ISA designed for very fast media processing 9 We introduce the PLX architecture along with some examples that highlight some of its features such as low cost multiplication a new definition of predication and datapath scalability Rs1 Rs2 Rd Figure 1 Parallel add instruction operating simultaneously on multiple subwords Register File ALU SPU M1 M2 M3 Figure 2 PLX processor with three functional units ALU Shift and Permute Unit SPU and an optional pipelined multiplier 2 PLX INSTRUCTIONS PLX instructions can be classified into three major groups based on the functional unit responsible for their execution ALU instructions shift and permute instructions and multiply instructions Figure 2 All instructions are 32 bits long and subword sizes are 1 2 4 and 8 bytes Basic ALU instructions shown in Table 1 include parallel add and subtract with modular or saturation arithmetic parallel shift and add parallel average parallel maximum and minimum logical and compare instructions Section 3 2 1 Low cost multiplication 2 3 Shift and permute instructions Pshift left right add instructions allow low cost integer and fixed point multiplication in the ALU without requiring a separate multiplier Since the shift amounts are limited to 1 2 or 3 bits to the right or left they are realized by a small pre shifter added to the ALU 8 10 Because multiplications can be performed efficiently and inexpensively in the ALU a separate integer multiplier becomes optional for very low cost and low power PLX implementations as indicated by the dotted lines in Figure 2 PLX has parallel shift and subword permute instructions implemented in the shift and permute unit Table 3 The parallel shift instructions shift the subwords in a register to the left or to the right by any amount specified either in an immediate field or in a register The shift right pair instruction first introduced in PA RISC processors is very useful for bit fields spanning two registers 5 7 This instruction concatenates two source registers and shifts this value to the right The lower half of the shifted value is placed in the destination register Rotation is achieved when both source operands are the same register Table 1 ALU instructions padd Instruction Description ci ai bi padd w saturation ci ai bi psubtract ci ai bi psubtract w saturation ci ai bi ci L H ci L H Table 2 Multiply instructions Instruction pmultiply shift right Description ci ai bi n lowerhalf pmultiply even c 2i c 2i 1 a 2i b2i pmultiply odd c 2i c 2i 1 a 2i 1 b2i 1 paverage ci average ai bi psubtract average ci average ai bi pshift left add ci ai n bi pshift right add ci ai n bi Instruction pshift left Description ci ai n pmaximum ci max ai bi pshift left variable ci a i b pminimum ci min ai bi pshift right ci ai n pshift right variable ci a i b c a op b where op is one logical operations and or of the logical operations not xor and complement Pd1 rel a b Pd2 Pd1 cmp compare cmp pw1 compare parallel see Section 3 write one Variables ci ai and bi correspond to the subwords in the destination and source registers respectively If no subscript is given the entire register is used as source or destination L and H represent the low and high saturation limits when saturation arithmetic is used If used n represents an immediate value given in the instruction word The function rel a b compares a and b for a relation specified in the instruction word If this relation is true rel a b returns 1 otherwise it returns 0 Pd1 and Pd2 are destination predicate registers in compare instructions 2 2 Full multiplication While they are low cost and effective the pshift left right instructions only allow multiplication by constants Therefore PLX also includes instructions to multiply two registers Table 2 These instructions are handled by a separate optional multiplier unit Pmultiply shift right right shifts the products before writing the lower order half of the bits to the destination register This allows selection of the desired 16 bits of each product Pmultiply odd and pmultiply even only multiply the odd or even indexed subwords of the source registers producing fulllength products Table 3 Shift and permute instructions shift right pair c a b n lowerhalf mix left right permute permute variable see text see text see text Subword permutation instructions are used to reorder the subwords in a register Mix instructions described in 2 4 7 are very useful for performing matrix transposition of subwords packed into multiple registers The permute instruction works on 1 byte and 2 byte subwords and performs a small set of carefully selected


View Full Document
Download PLX: A FULLY SUBWORD-PARALLEL INSTRUCTION
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view PLX: A FULLY SUBWORD-PARALLEL INSTRUCTION and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view PLX: A FULLY SUBWORD-PARALLEL INSTRUCTION 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?