GT ECE 4893 - Cell Programming Tips & Techniques - D364959

Home> Schools> Georgia Tech> Electrical & Computer Engr (ECE) > ECE 4893> Cell Programming Tips & Techniques

DOC PREVIEW

GT ECE 4893 - Cell Programming Tips & Techniques

School name Georgia Tech

Course Ece 4893- Special Topics

Pages 43

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Cell Programming Tips & TechniquesClass Objectives – Things you will learnClass AgendaReview Cell ArchitectureCell ProcessorCourse AgendaKey SPE FeaturesSPE – Single-Ported Local MemorySPU Programming TipsSlide 10Programming Levels on Cell BEOverlap DMA with computationStart DMAs from SPUInstruction SchedulingInstruction Starvation SituationInstruction Starvation PreventionDesign for Limited Local StoreBranch OptimizationsBranchesSlide 20Hinting Branches & Instruction Starvation PreventionLoop UnrollingLoop Unrolling - ExamplesSPUSPU – Software PipelineInteger MultipliesAvoid Scalar CodeChoose an SIMD strategy appropriate for your algorithmChoose SIMD strategy appropriate for algorithmSIMD ExampleLoad / Store by QuadwordSIMD Programming TipsSlide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Use Offset PointerShuffle byte instructions for table look-upsSlide 43Systems and Technology GroupCell Programming Tips & Techniques 1Cell Programming Tips & TechniquesCell Programming WorkshopCell Ecosystem Solutions EnablementSystems and Technology GroupCell Programming Tips & Techniques 2Class Objectives – Things you will learnKey programming techniques to exploit cell hardware organization and language features for–SPU–SIMDSystems and Technology GroupCell Programming Tips & Techniques 3Class AgendaReview relevant SPE FeaturesSPU Programming Tips–Level of Programming (Assembler, Intrinsics, Auto-Vectorization)–Overlap DMA with computation (double, multiple buffering)–Dual Issue rate (Instruction Scheduling)–Design for limited local store–Branch hints or elimination–Loop unrolling and pipelining–Integer multiplies (avoid 32-bit integer multiplies)–Shuffle byte instructions for table look-ups–Avoid scalar code–Choose the right SIMD strategy–Load / Store only by quadwordSIMD Programming TipsSystems and Technology GroupCell Programming Tips & Techniques 4Review Cell ArchitectureSystems and Technology GroupCell Programming Tips & Techniques 5Cell ProcessorSystems and Technology GroupCell Programming Tips & Techniques 6Course AgendaCell Blade ProductsCell Blade Family of ServersCell Blade ArchitectureCell Blade Overview–Critical signals, link speed and bandwidth–Power consumption–Hardware componentsBlade and blade center assemblyExample of a cell blade with maximum interconnection capabilityOptions - InfinibandTrademarks: Cell Broadband Engine ™ is a trademark of Sony Computer Entertainment, Inc.References: Dan Brokenshire, BE Programming TipsSystems and Technology GroupCell Programming Tips & Techniques 7Key SPE FeaturesSystems and Technology GroupCell Programming Tips & Techniques 8SPE – Single-Ported Local MemorySystems and Technology GroupCell Programming Tips & Techniques 9SPU Programming TipsSystems and Technology GroupCell Programming Tips & Techniques 10SPU Programming TipsLevel of Programming (Assembler, Intrinsics, Auto-Vectorization)Overlap DMA with computation (double, multiple buffering)Dual Issue rate (Instruction Scheduling)Design for limited local storeBranch hints or eliminationLoop unrolling and pipeliningInteger multiplies (avoid 32-bit integer multiplies)Shuffle byte instructions for table look-upsAvoid scalar codeChoose the right SIMD strategyLoad / Store only by quadwordSystems and Technology GroupCell Programming Tips & Techniques 11Programming Levels on Cell BEExpert level–Assembler, high performance, high effortsMore ease of programming–C compiler, vector data types, intrinsics, compiler schedules instructions + allocates registersAuto-SIMDization–for scalar loops, user should support by alignment directives, compiler provides feedback about SIMDizationHighest degree of ease of use–user-guided parallelization necessary, Cell BE looks like a single processorTrade-OffPerformance vs. EffortRequirements for Compiler increasing with each levelSystems and Technology GroupCell Programming Tips & Techniques 12Overlap DMA with computationDouble or multi-buffer code or (typically) dataExample for double bufferign n+1 data blcoks:–Use multiple buffers in local store–Use unique DMA tag ID for each buffer–Use fence commands to order DMAs within a tag group –Use barrier commands to ordr DMAs within a queueSystems and Technology GroupCell Programming Tips & Techniques 13Start DMAs from SPUUse SPE-initiated DMA transfers rather than PPE-initiated DMA transfers, because–there are more SPEs than the one PPE–the PPE can enqueue only eight DMA requests whereas each SPE can enqueue 16Systems and Technology GroupCell Programming Tips & Techniques 14Instruction SchedulingSystems and Technology GroupCell Programming Tips & Techniques 15Instruction Starvation Situationinstruction buffersFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMThere are 2 instruction buffers–up to 64 ops along the fall-through pathFirst buffer is half-empty–can initiate refillWhen MEM port is continuously used –starvation occurs (no ops left in buffers)Dual-IssueInstructionLogicDual-IssueInstructionLogicinitiaterefillafter halfemptySystems and Technology GroupCell Programming Tips & Techniques 16Instruction Starvation Preventioninstruction bufferFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMFP MEMSPE has an explicit IFETCH op–which initiates an instruction fetchScheduler monitors starvation situation–when MEM port is continuously used–insert IFETCH op within the (red) windowCompiler design–scheduler must keep track of code layoutDual-IssueInstructionLogicDual-IssueInstructionLogicinitiaterefillafter halfemptyrefill IFETCH latencybeforeit is toolate tohidelatencySystems and Technology GroupCell Programming Tips & Techniques 17Design for Limited Local StoreThe Local Store holds up to 256 KB for–the program, stack, local data structures, and DMA buffers.Most performance optimizations put pressure on local store (e.g. multiple DMA buffers)Use plug-ins (runtime download program kernels) to build complex function servers in the LS.Systems and Technology GroupCell Programming Tips & Techniques 18Branch OptimizationsSPE–Heavily pipelined  high penalty for branch misses (18

View Full Document