DOC PREVIEW
Berkeley COMPSCI 252 - Multimedia Instruction Sets: SIMD and Vector

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 15 Multimedia Instruction Sets: SIMD and VectorWhat is Multimedia Processing?The Need for Multimedia ISAsExample: MPEG DecodingExample: 3D GraphicsCharacteristics of Multimedia Apps (1)Characteristics of Multimedia Apps (2)Examples of Media FunctionsApproaches to MediaprocessingSIMD Extensions for GPPOverview of SIMD ExtensionsExample of SIMD Operation (1)Example of SIMD Operation (2)Summary of SIMD Operations (1)Summary of SIMD Operations (2)Programming with SIMD ExtensionsSIMD PerformanceA Closer Look at MMX/SSECS 252 AdministriviaVector ProcessorsProperties of Vector ProcessorsStyles of Vector ArchitecturesComponents of a Vector ProcessorBasic Vector InstructionsVector Memory OperationsVector Code ExampleSetting the Vector LengthStrip MiningChoosing the Data Type WidthOther Features for MultimediaOptimization 1: ChainingOptimization 2: Multi-lane ImplementationChaining & Multi-lane ExampleOptimization 3: Conditional ExecutionVector Architecture StateTwo Ways to VectorizationOuter-loop Example (1)Outer-loop Example (2)Designing a Vector ProcessorChanges to Scalar ProcessorHow to Pick Max. Vector Length?How to Pick Max Vector Length?How to Pick # of Vector Registers?Context Switch Overhead?Exception Handling: ArithmeticException Handling: Page FaultsException Handling: InterruptsVector Power ConsumptionWhy Vectors for Multimedia?Comparison with SIMDA Vector Media-Processor: VIRAMPerformance ComparisonFFT (1)FFT (2)SIMD SummaryVector SummaryLecture 15Multimedia Instruction Sets:SIMD and VectorChristoforos E. Kozyrakis([email protected])CS252 Graduate Computer ArchitectureUniversity of California at BerkeleyMarch 14th, 2001CS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/012What is Multimedia Processing?•Desktop:–3D graphics (games)–Speech recognition (voice input)–Video/audio decoding (mpeg-mp3 playback)•Servers:–Video/audio encoding (video servers, IP telephony)–Digital libraries and media mining (video servers)–Computer animation, 3D modeling & rendering (movies)•Embedded:–3D graphics (game consoles)–Video/audio decoding & encoding (set top boxes)–Image processing (digital cameras)–Signal processing (cellular phones)CS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/013The Need for Multimedia ISAs•Why aren’t general-purpose processors and ISAs sufficient for multimedia (despite Moore’s law)?•Performance–A 1.2GHz Athlon can do MPEG-4 encoding at 6.4fps–One 384Kbps W-CDMA channel requires 6.9 GOPS•Power consumption–A 1.2GHz Athlon consumes ~60W–Power consumption increases with clock frequency and complexity•Cost–A 1.2GHz Athlon costs ~$62 to manufacture and has a list price of ~$600 (module)–Cost increases with complexity, area, transistor count, power, etcCS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/014Example: MPEG DecodingParsingDequantizationIDCTBlock ReconstructionRGB->YUVInput StreamOutput to Screen10%20%25%30%15%Load BreakdownCS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/015Example: 3D GraphicsTransformLightingDisplay ListsOutput to ScreenGeometry PipeSetupRasterizationAnti-aliasingShading, foggingTexture mappingAlpha blendingZ-bufferClippingFrame-buffer opsRendering Pipe10%10%35%55%Load BreakdownCS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/016Characteristics of Multimedia Apps (1)•Requirement for real-time response–“Incorrect” result often preferred to slow result–Unpredictability can be bad (e.g. dynamic execution)•Narrow data-types–Typical width of data in memory: 8 to 16 bits–Typical width of data during computation: 16 to 32 bits–64-bit data types rarely needed–Fixed-point arithmetic often replaces floating-point•Fine-grain (data) parallelism–Identical operation applied on streams of input data–Branches have high predictability–High instruction locality in small loops or kernelsCS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/017Characteristics of Multimedia Apps (2)•Coarse-grain parallelism–Most apps organized as a pipeline of functions –Multiple threads of execution can be used•Memory requirements–High bandwidth requirements but can tolerate high latency–High spatial locality (predictable pattern) but low temporal locality–Cache bypassing and prefetching can be crucialCS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/018Examples of Media Functions•Matrix transpose/multiply•DCT/FFT•Motion estimation•Gamma correction•Haar transform•Median filter•Separable convolution•Viterbi decode•Bit packing•Galois-fields arithmetic•…(3D graphics)(Video, audio, communications)(Video)(3D graphics)(Media mining)(Image processing)(Image processing)(Communications, speech)(Communications, cryptography)(Communications, cryptography)CS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/019Approaches to MediaprocessingMultimediaProcessingGeneral-purpose processors with SIMD extensionsVector ProcessorsVLIW with SIMD extensions (aka mediaprocessors)DSPsASICs/FPGAsCS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/0110SIMD Extensions for GPP•Motivation–Low media-processing performance of GPPs–Cost and lack of flexibility of specialized ASICs for graphics/video–Underutilized datapaths and registers•Basic idea: sub-word parallelism–Treat a 64-bit register as a vector of 2 32-bit or 4 16-bit or 8 8-bit values (short vectors)–Partition 64-bit datapaths to handle multiple narrow operations in parallel•Initial constraints–No additional architecture state (registers)–No additional exceptions–Minimum area overheadCS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/0111Overview of SIMD ExtensionsVendor Extension Year # Instr RegistersHP MAX-1 and 294,95 9,8 (int) Int 32x64bSun VIS 95 121 (int) FP 32x64bIntel MMX 97 57 (int) FP 8x64bAMD 3DNow! 98 21 (fp) FP 8x64bMotorola Altivec 98 162 (int,fp) 32x128b (new)Intel SSE 98 70 (fp) 8x128b (new)MIPS MIPS-3D ? 23 (fp) FP 32x64bAMD E 3DNow! 99 24 (fp) 8x128 (new)Intel SSE-2 01 144 (int,fp) 8x128 (new)CS252, Lecture 15: Multimedia Instruction Sets: SIMD and Vector C.E. Kozyrakis, 3/14/0112Example of SIMD Operation (1)* * * *+ +Sum of Partial ProductsCS252,


View Full Document

Berkeley COMPSCI 252 - Multimedia Instruction Sets: SIMD and Vector

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Multimedia Instruction Sets: SIMD and Vector
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multimedia Instruction Sets: SIMD and Vector and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multimedia Instruction Sets: SIMD and Vector 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?