UNCC ECGR 6185 - A Versatile Computation Module for Adaptable Multimedia Processors

Unformatted text preview:

A Versatile Computation Module for AdaptableMultimedia ProcessorsYunan Xiang, Ryan Pettibone, Martin MargalaDepartment of Electrical and Computer Engineering, University of RochesterRochester, NY 14627-0231, USAEmail: {xiang,pettibon,[email protected]}Abstract—This paper describes a low cost, low power, versatilecomputation module that can be used as a coarse-grainbuilding block in multimedia processors. The module, whichhas a datapath and a controller integrated with its local datamemory, performs various arithmetic operations on differentdata types, i.e., 8-bit integer, 16-bit integer, 32-bit integer andsingle precision floating point numbers. Running in parallel,the module provides high data throughput at low hardwarecost. Multiple modules will be connected in a multimediaprocessor operated in mixed SIMD and MIMD modes,providing great flexibility for data parallel, computationintensive multimedia applications. I. INTRODUCTIONWith the technology advances in integrated circuits (IC)fabrication and the increasing communication channel bitrates, multimedia processors are playing a more significant rolein main computer systems as well as in personal mobiledevices [1]. Multimedia applications usually involves largeamount of data, require high data rate, real-time processing.Although the amount of data and computation throughputvaries over a wide range depending on the required qualityof the applications, multimedia data processing has thefollowing characteristics: The word lengths of theprocessing data are 8 bits, 16 bits or 24 bits, which requiresfrequent use of small integer operations; Arithmeticoperations are highly computation-intensive and repetitivewith data parallelism; Intensive memory access to a largememory space requires high bandwidth memory interface[2].In this paper, a low power, low hardware costcomputation module is proposed to provide the desirablefeatures for constructing modular array based multimediaprocessors. The module has a datapath, a controller and alocal 16 KB SRAM data cache. The datapath comprises four8-bit Processing elements (PE). Therefore, the module canoperate on four 8-bit or two 16-bit integer numbers in aSingle Instruction Multiple Data (SIMD) mode or performone 32-bit integer arithmetic operation. It also supportsaddition, subtraction and multiplication of IEEE 754standard floating point numbers.M4M8M12M3M7M11M2M6M10M1M5M9ControlModuleFigure 1. An adaptive multimedia processor using the moduleMultimedia signal processing can take advantage of theflexible data types and multiple arithmetic operationsavailable from the module. Local data memory provides therequired memory access data bandwidth. As shown in Fig.1, an adaptive multimedia processor can be constructedusing the proposed module. Depending on the applications,an array of modules is divided into several clusters, asindicated by the dash-line frames. Each cluster may havedifferent numbers of modules operated in SIMD fashion forone application, while other clusters performing differentapplications at the same time in a multiple instructionmultiple data (MIMD) mode. For example, one cluster mayperform a two-dimensional discrete cosine transform (2-DDCT) while another cluster is performing a discrete wavelettransform (DWT). Instructions for the different modules areprovided from the upper level controller using encodedaddress for the modules.The paper is organized as the followings: Section IIdescribes the architecture of the proposed module. SectionIII discusses the supported arithmetic operation modes.Implementation and simulation results are presented insection IV. Section V draws a conclusion.II. MODULE ARCHITECTUREThe architecture of the proposed module is shown in Fig.2. The module has a datapath, a controller and a 16 KBSRAM local data memory. When an instruction is executedin the module, the addresses of the input data are provided tothe local memory and the 64-bit data are retrieved from theSRAM to the input register. The controller takes the 5-bitoperation code (opcode) from the instruction and decodes itinto various control signals at different clock cycles to thedatapath so that the datapath can perform the specifiedoperations. The Outputs of the operations are available in theoutput registers and can be either written back to the localdata memory, send to other modules through processor leveldata bus network, or direct to the output of the multimediaprocessor. ModulecontrollerLocal SRAM data cache64-bit input registerPE4 PE3 PE2 PE1Preliminary logic64-bit pipeline registerConnection networkOutput register64-bit partitionable adderSignNormalizerFigure 2. The architecture of the moduleAdd 3 Add 2 Add 1Add 4Mult 2 Mult 1ff ffFA[7:4] B[7:4] A[3:0]B[3:0]1 00 1ciAA BB0B[3:0]i iiiiiiii0Figure 3. The Processing Element block diagramA. The DatapathThe datapath of the module consist of two stagesseparated by the pipeline register, as shown in Fig. 2. Thefirst stage includes a 64-bit input register, the preliminarylogic and four processing elements. The 64-bit input registeris directly connected to the 64-bit-wide data bus of the localSRAM. For every operation instruction, two 32-bit inputdata are read into the input register at the same clock cycle.The 32-bit input data could be four 8-bit integer operands, ortwo 16-bit integer operands, or one 32-bit integer operand, orone IEEE 754 single precision floating point number.Depending on the data types and operations specified in theinstruction, the preliminary logic decomposes a 32-bitnumber into four sets of 8-bit numbers and feeds them intodifferent PEs.The 8-bit processing elements are the workhorse of theentire datapath. It is designed to be efficiently shared by allof the operation modes with minimum hardware to lower thearea cost and power consumption. Fig. 3 shows the blockdiagram of the PE. The main computation units in it are two4x4 multipliers and four 4-bit ripple carry adders which canbe carry linked. Each PE accepts two 8 bit operands, andoutputs a 16 bit vector. It performs


View Full Document

UNCC ECGR 6185 - A Versatile Computation Module for Adaptable Multimedia Processors

Documents in this Course
Zigbee

Zigbee

33 pages

Load more
Download A Versatile Computation Module for Adaptable Multimedia Processors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Versatile Computation Module for Adaptable Multimedia Processors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Versatile Computation Module for Adaptable Multimedia Processors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?