The IBM Cell ArchitectureTopicsSlide 3MotivationGoalsSlide 6Software Cells: The ConceptSoftware Cells: FormattingComparison with Dataflow ArchitectureSlide 10Machine ArchitectureSoC Architecture(Envisioned) SPU ArchitectureSlide 14Prototype Chip FloorplanNotes on PrototypeSlide 17Programmer’s Interface: Two-PartsSlide 19Cell ReferencesSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureThe IBM Cell ArchitectureThe IBM Cell ArchitectureSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureApril 18, 2006April 18, 2006Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.OverviewOverview2.2.Software CellsSoftware Cells3.3.Machine ArchitectureMachine Architecture4.4.Product PrototypeProduct Prototype5.5.Programmer’s InterfaceProgrammer’s Interface6.6.References and GlossaryReferences and GlossarySam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.OverviewOverview2.2.Software CellsSoftware Cells3.3.Machine ArchitectureMachine Architecture4.4.Product PrototypeProduct Prototype5.5.Programmer’s InterfaceProgrammer’s Interface6.6.References and GlossaryReferences and GlossarySam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureMotivationMotivationIBM’s formal name for Cell is “Cell Broadband Engine IBM’s formal name for Cell is “Cell Broadband Engine Architecture” (CBEA)Architecture” (CBEA)Sony wanted:Sony wanted:Quantum leap in performance over PlayStation 2’s “Emotion Engine” chip (made by Toshiba)Toshiba wanted:Toshiba wanted:Remain a part of volume manufacturing for Sony PlayStationIBM wanted:IBM wanted:A piece of the PlayStation 3 pieA second try at network processor architectureSomething reusable, applicable far beyond PlayStationSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureGoalsGoalsApplication domainsApplication domainsGraphics Rendering ($$)DSP & Multimedia Processing ($$)CryptographyPhysics simulationsMatrix math and other scientific processingHeavy use of SIMD – why?Heavy use of SIMD – why?Cray and similar machines of 1970s achieved performance through vectorization rather than MIMD parallelizationThe above applications are areas in which SIMD is still the best architectureSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.OverviewOverview2.2.Software CellsSoftware Cells3.3.Machine ArchitectureMachine Architecture4.4.Product PrototypeProduct Prototype5.5.Programmer’s InterfaceProgrammer’s Interface6.6.References and GlossaryReferences and GlossarySam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureSoftware Cells: The ConceptSoftware Cells: The ConceptDefinitionDefinitionBundle of application code and working dataFeaturesFeaturesNecessarily object-orientedCells can migrate to any processor – local or remoteDistributed processing is native, and actually assumed•Execution of cell code actually looks like a remote procedure callA cell contains everything it needs to execute autonomously without references to other memory, programs or resourcesHighly secure model!Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureSoftware Software Cells: Cells: FormattingFormattingSource:U.S. Patent#6,809,734Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureComparison with Dataflow ArchitectureComparison with Dataflow ArchitectureGranularityGranularityDataflow execution granularity is one instructionCell execution granularity is a procedure, or several hundred instructionsopcodeoperand Aaddressoperand BaddressdestinationaddressDataflow instruction template:Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.OverviewOverview2.2.Software CellsSoftware Cells3.3.Machine ArchitectureMachine Architecture4.4.Product PrototypeProduct Prototype5.5.Programmer’s InterfaceProgrammer’s Interface6.6.References and GlossaryReferences and GlossarySam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureMachine ArchitectureMachine ArchitectureEach Cell SoC contains:Each Cell SoC contains:Conventional processor (PPE), for control and a lightweight OS•2-way SMT, 2-way superscalar in-order Power coreMultiple Synergistic Processing Elements (SPEs)•These are execution engines for RPC of a software-cellDMA interface to memory and I/OElement Interconnect Bus (EIB), actually a ring busEach SPE contains:Each SPE contains:128 registers, 128 bits wide in unified regfile (2Kbytes of registers!)256 Kbytes local memory4 SIMD integer pipelines/ALUs4 SIMD floating point pipelines/FPUsSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureSoC ArchitectureSoC ArchitectureALUs (4)FPUs (4)regfile128x128256KBlocalmemoryALUs (4)FPUs (4)regfile128x128256KBlocalmemoryALUs (4)FPUs (4)regfile128x128256KBlocalmemoryALUs (4)FPUs (4)regfile128x128256KBlocalmemoryALUs (4)FPUs (4)regfile128x128256KBlocalmemoryALUs (4)FPUs (4)regfile128x128256KBlocalmemoryALUs (4)FPUs (4)regfile128x128256KBlocalmemoryALUs (4)FPUs (4)regfile128x128256KBlocalmemory64-bit SMTPower core,2x in-ordersuperscalar512K L2I$ D$EIBDMA, I/OControllersPPESam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer Architecture(Envisioned) SPU Architecture(Envisioned) SPU ArchitectureResources for execution of multiple software cells are Resources for execution of multiple software cells are reserved in advance by the PPE:reserved in advance by the PPE:Some portion of local memoryOne or more dedicated integer/FP pipelinesNot SMT – pipelines are allocated in a dedicated way for the duration of the execution of the cellExecution is supposed to be entirely self-containedExecution is supposed to be entirely self-containedSoftware cell is small enough to execute on only one APUNo use of DRAM – the only addressable memory is local•Local memory is not cache – no
View Full Document