DOC PREVIEW
GT ECE 4893 - Cell Programming Tips & Techniques
School name Georgia Tech
Pages 43

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Cell Programming Tips & TechniquesClass Objectives – Things you will learnClass AgendaReview Cell ArchitectureCell ProcessorCourse AgendaKey SPE FeaturesSPE – Single-Ported Local MemorySPU Programming TipsSlide 10Programming Levels on Cell BEOverlap DMA with computationStart DMAs from SPUInstruction SchedulingInstruction Starvation SituationInstruction Starvation PreventionDesign for Limited Local StoreBranch OptimizationsBranchesSlide 20Hinting Branches & Instruction Starvation PreventionLoop UnrollingLoop Unrolling - ExamplesSPUSPU – Software PipelineInteger MultipliesAvoid Scalar CodeChoose an SIMD strategy appropriate for your algorithmChoose SIMD strategy appropriate for algorithmSIMD ExampleLoad / Store by QuadwordSIMD Programming TipsSlide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Use Offset PointerShuffle byte instructions for table look-upsSlide 43Systems and Technology GroupCell Programming Tips & Techniques 1Cell Programming Tips & TechniquesCell Programming WorkshopCell Ecosystem Solutions EnablementSystems and Technology GroupCell Programming Tips & Techniques 2Class Objectives – Things you will learnKey programming techniques to exploit cell hardware organization and language features for–SPU–SIMDSystems and Technology GroupCell Programming Tips & Techniques 3Class AgendaReview relevant SPE FeaturesSPU Programming Tips–Level of Programming (Assembler, Intrinsics, Auto-Vectorization)–Overlap DMA with computation (double, multiple buffering)–Dual Issue rate (Instruction Scheduling)–Design for limited local store–Branch hints or elimination–Loop unrolling and pipelining–Integer multiplies (avoid 32-bit integer multiplies)–Shuffle byte instructions for table look-ups–Avoid scalar code–Choose the right SIMD strategy–Load / Store only by quadwordSIMD Programming TipsSystems and Technology GroupCell Programming Tips & Techniques 4Review Cell ArchitectureSystems and Technology GroupCell Programming Tips & Techniques 5Cell ProcessorSystems and Technology GroupCell Programming Tips & Techniques 6Course AgendaCell Blade ProductsCell Blade Family of ServersCell Blade ArchitectureCell Blade Overview–Critical signals, link speed and bandwidth–Power consumption–Hardware componentsBlade and blade center assemblyExample of a cell blade with maximum interconnection capabilityOptions - InfinibandTrademarks: Cell Broadband Engine ™ is a trademark of Sony Computer Entertainment, Inc.References: Dan Brokenshire, BE Programming TipsSystems and Technology GroupCell Programming Tips & Techniques 7Key SPE FeaturesSystems and Technology GroupCell Programming Tips & Techniques 8SPE – Single-Ported Local MemorySystems and Technology GroupCell Programming Tips & Techniques 9SPU Programming TipsSystems and Technology GroupCell Programming Tips & Techniques 10SPU Programming TipsLevel of Programming (Assembler, Intrinsics, Auto-Vectorization)Overlap DMA with computation (double, multiple buffering)Dual Issue rate (Instruction Scheduling)Design for limited local storeBranch hints or eliminationLoop unrolling and pipeliningInteger multiplies (avoid 32-bit integer multiplies)Shuffle byte instructions for table look-upsAvoid scalar codeChoose the right SIMD strategyLoad / Store only by quadwordSystems and Technology GroupCell Programming Tips & Techniques 11Programming Levels on Cell BEExpert level–Assembler, high performance, high effortsMore ease of programming–C compiler, vector data types, intrinsics, compiler schedules instructions + allocates registersAuto-SIMDization–for scalar loops, user should support by alignment directives, compiler provides feedback about SIMDizationHighest degree of ease of use–user-guided parallelization necessary, Cell BE looks like a single processorTrade-OffPerformance vs. EffortRequirements for Compiler increasing with each levelSystems and Technology GroupCell Programming Tips & Techniques 12Overlap DMA with computationDouble or multi-buffer code or (typically) dataExample for double bufferign n+1 data blcoks:–Use multiple buffers in local store–Use unique DMA tag ID for each buffer–Use fence commands to order DMAs within a tag group –Use barrier commands to ordr DMAs within a queueSystems and Technology GroupCell Programming Tips & Techniques 13Start DMAs from SPUUse SPE-initiated DMA transfers rather than PPE-initiated DMA transfers, because–there are more SPEs than the one PPE–the PPE can enqueue only eight DMA requests whereas each SPE can enqueue 16Systems and Technology GroupCell Programming Tips & Techniques 14Instruction SchedulingSystems and Technology GroupCell Programming Tips & Techniques 15Instruction Starvation Situationinstruction buffersFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMThere are 2 instruction buffers–up to 64 ops along the fall-through pathFirst buffer is half-empty–can initiate refillWhen MEM port is continuously used –starvation occurs (no ops left in buffers)Dual-IssueInstructionLogicDual-IssueInstructionLogicinitiaterefillafter halfemptySystems and Technology GroupCell Programming Tips & Techniques 16Instruction Starvation Preventioninstruction bufferFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMSPE has an explicit IFETCH op–which initiates an instruction fetchScheduler monitors starvation situation–when MEM port is continuously used–insert IFETCH op within the (red) windowCompiler design–scheduler must keep track of code layoutDual-IssueInstructionLogicDual-IssueInstructionLogicinitiaterefillafter halfemptyrefill IFETCH latencybeforeit is toolate tohidelatencySystems and Technology GroupCell Programming Tips & Techniques 17Design for Limited Local StoreThe Local Store holds up to 256 KB for–the program, stack, local data structures, and DMA buffers.Most performance optimizations put pressure on local store (e.g. multiple DMA buffers)Use plug-ins (runtime download program kernels) to build complex function servers in the LS.Systems and Technology GroupCell Programming Tips & Techniques 18Branch OptimizationsSPE–Heavily pipelined  high penalty for branch misses (18


View Full Document

GT ECE 4893 - Cell Programming Tips & Techniques

Documents in this Course
Load more
Download Cell Programming Tips & Techniques
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cell Programming Tips & Techniques and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cell Programming Tips & Techniques 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?