DOC PREVIEW
UW-Madison ECE 734 - Implementation of MPEG2 Codec with MMX-SSE-SSE2 Technology

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

I. INTRODUCTIONII. MMX/SSE/SSE2A. MMXB. SSEC. SSE2III. MPEG 2 Video CompressionIV. Utilize MMX/SSE/SSE2 to Speed UP ProgramA. Generate Profiling Information and Identify the KernelsB. Utilize MMX/SSE/SSE2 to Rewrite the KernelsC. Experimental ResultsD. Platform CompatibilityV. ConclusionECE 734 Final ReportI. INTRODUCTIONhe MMX/SSE/SSE2 technology has been designed to accelerate multimedia and communications applications. In th isproject, we comprehensively studied MMX/SSE/SSE2 so as to obtain a thorough understanding about how they canfacilitate those applications. Also we implemented an MEPG 2 codec using MMX/SSE/SSE2 technology. In thisimplementation process, we found that changing C to MMX/SSE/SSE2 is not a simple sentence-by-sentence translation. Inorder to achieve significant speed-up, besides searching a suitable MMX/SSE/SSE2 instruction, there are many things thatneed to be performed, for example redesign the algorithm to facilitate parallel operations. It is also necessary to considermemory alignment, cache performance and pipeline interlock etc.TThe project is divided into three stages, which are shown in the following figure:Since implementing a whole MPEG 2 encoder and decoder suite will be a huge task, we first found a MPEG 2 codec using Ccode. Then we tried to identify the kernels of this original code. This is done by generating profiling information by using gprofcommand in LINUX. After carefully studying the profiling information, we found that several functions dominate the totalexecution time. This makes it possible that we only modify those functions while accelerating the code noticeably. Performancecomparison shows that verbatim translating from C to MMX/SSE/SSE2 only provides 4-5X speed-up. However, whencombined with loop unrolling, code scheduling, the speed-up can be as large as 25X. The following report is organized as follows: In Section II, we compare the differences among MMX, SSE, and SSE2;Section II gives a brief introduction to MPEG 2 technology. In Section III, our project is reported step by step. This report isconcluded in Section IV. II. MMX/SSE/SSE2Any computer, whether sequential or parallel, operates by executing instructions on data. A stream of instructions (thealgorithm) tells the computer what to do at each step. A stream of data (the input to the algorithm) is affected by theseinstructions. A widely used classification of parallel systems, due to Michael J. Flynn, is based on the number of simultaneousinstruction and data streams seen by the processor during program execution. Depending on whether there is one or several ofthese streams, computers can be divided in four classes:a) Single Instruction stream, Single Data stream (SISD)b) Multiple Instruction stream, Single Data stream (MISD)c) Single Instruction stream, Multiple Data stream (SIMD)Rong Jiang and Jin XuImplementation of MPEG2 Codec withMMX/SSE/SSE2 Technology Fig. 1. Project Outline.Stage IIIStage IIFind a MPEG 2 encoder and decoder C codeGenerate profiling informationIdentify the kernelsRewrite kernels using SSEPerformance comparisonStage I1ECE 734 Final Reportd) Multiple Instruction stream, Multiple Data stream (MIMD)A SISD computer consists of a single processing unit receiving a single instruction stream that operates on a single stream ofdata. At each step, the control unit emits one instruction that operates on a datum obtained from the memory unit. Almost allcomputers in use today adhere to this model invented by John von Neumann in the last 1940s. An algorithm that runs on aSISD computer is said sequential (or serial), as it does not contain any parallelism. For MISD, N processors, each with its owncontrol unit, share a common memory unit. At each step, one data element received from memory is processed by all processorssimultaneously, each according to the instructions received from its control unit. Parallelism is achieved by letting theprocessors do different things on the same data. This class of computers lends itself naturally to those computations requiringan input to be subjected to several operations, each receiving the input in its original form. A SIMD computer consists of Nidentical processors, each with its own local memory where it can store data. All processors work under the control of a singleinstruction stream issued by a central control unit. There are N data streams, one per processor. The processors operatesynchronously: at each step, all processors execute the same instruction on a different data element. SIMD computers are muchmore versatile that MISD computers. Numerous problems covering a wide variety of applications can be solved by parallelalgorithms on SIMD computers. Another interesting feature is that algorithms for these computers are relatively easy to design,analyze and implement. On the downside, only problems that can be subdivided into a set of identical sub-problems all ofwhich are then solved simultaneously by the same set of instructions can be tackled with SIMD computers. There are manycomputations that do not fit this pattern: such problems are typically subdivided into sub-problems that are not necessarilyidentical, and are solved using MIMD computers. MIMD is the most general and most powerful in Flynn’s classification. Herethere are N processors, N streams of instructions and N streams of data. Each processor owns its control unit and its localmemory, making them more powerful than those used in SIMD computers. Each processor operates under the control of aninstruction stream issued by its control unit. Therefore the processors are potentially all executing different programs ondifferent data while solving different sub-problems of a single problem. This means that the processors usually operateasynchronously. The MIMD model of parallel computation is the most general and powerful: computers in this class are used tosolve in parallel those problems that lack the regular structure required by the SIMD model. On the downside, asynchronousalgorithms are difficult to design, analyze and implement.A. MMXThe MMX technology is designed to accelerate multimedia and communications applications by including new instructionsand data types that allow applications to


View Full Document

UW-Madison ECE 734 - Implementation of MPEG2 Codec with MMX-SSE-SSE2 Technology

Documents in this Course
Load more
Download Implementation of MPEG2 Codec with MMX-SSE-SSE2 Technology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Implementation of MPEG2 Codec with MMX-SSE-SSE2 Technology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Implementation of MPEG2 Codec with MMX-SSE-SSE2 Technology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?