15-745 © Seth Copen Goldstein 2005-7 115-745Topic: Exotic Architectures• Doug Burger et.al., "Scaling to the End of Silicon with EDGE Architectures", IEEE Computer July 2004.• Jan Hoogerbrugge, et al., ``Software pipelining for transport-triggered architectures'',MICRO 24 (1991). • Steve Swanson, et al. “WaveScalar” ,MICRO-36, December 200315-745 © Seth Copen Goldstein 2005-7 2Its not about computing after all!• What is the fundamental operation in a computer?15-745 © Seth Copen Goldstein 2005-7 3It’s not about computing after all!• What is the fundamental operation in a computer?– It is not the add, the multiply, the xor, etc.– It is the move• Typical (read x86,etc.) architectures don’t ALLOW this to be expressed!• All three papers share a common goal:Represent the data movement involved in computation explicitlyBTW: Really bad slide, why?15-745 © Seth Copen Goldstein 2005-7 4What is Exotic?•ISA– An abstraction provided by computer designerE.g., no change in programs when• transistor shrinks by factor of 2 or even 10!• start using aluminum to transmit info (and then copper!)• voltage changes by factor of 5x!• change from micro-coded engine to risc core!• 10x registers introduced (internal ones)– Limits what can be expressed•no “move”s•what else?15-745 © Seth Copen Goldstein 2005-7 5One view on compiler/Arch DivideFrontendFrontendApplicationsequential(superscalar)dependence(dataflow)independence(EPIC)independence(VLIW)Compilation time(Software)Determine DependenciesDetermine DependenciesDetermine IndependenciesDetermine IndependenciesBind Function UnitsBind Function UnitsDetermine DependenciesDetermine DependenciesDetermine IndependenciesDetermine IndependenciesBind Function UnitsBind Function UnitsBind Datapaths & ExecuteBind Datapaths & ExecuteRun time(Hardware)ILP ArchitecturesSlides Adapted/From: J.Takala/TUT15-745 © Seth Copen Goldstein 2005-7 6VLIWRegister FileInstruction FetchInstruction DecodeData MemoryInstruction MemoryBypassing NetworkCPUFU-1FU-2FU-3FU-4FU-5Register FileInstruction FetchInstruction DecodeData MemoryInstruction MemoryBypassing NetworkCPUFU-1FU-2FU-3FU-4FU-5• Scaling Drawbacks?Slides Adapted/From: J.Takala/TUT15-745 © Seth Copen Goldstein 2005-7 7VLIWRegister FileInstruction FetchInstruction DecodeData MemoryInstruction MemoryBypassing NetworkCPUFU-1FU-2FU-3FU-4FU-5Register FileInstruction FetchInstruction DecodeData MemoryInstruction MemoryBypassing NetworkCPUFU-1FU-2FU-3FU-4FU-5• Scaling Drawbacks?– Bypass complexity– Register file complexity– Register file design restricts FU flexibility–Operation encoding format restricts FU flexibilitySlides Adapted/From: J.Takala/TUT15-745 © Seth Copen Goldstein 2005-7 8Transport-Triggered Arch• Only 1 instruction: MOVE• Don’t specify operations,specify register mov’tRegister FileBypassing NetworkVLIWInstruction FetchInstruction DecodeInstruction MemoryFU-1FU-2FU-3FU-4FU-5Data MemoryInstruction FetchInstruction DecodeBypassing NetworkFU-1FU-2FU-3FU-4FU-5RegisterFileTTARegister FileBypassing NetworkVLIWInstruction FetchInstruction DecodeInstruction MemoryFU-1FU-2FU-3FU-4FU-5Data MemoryInstruction FetchInstruction DecodeBypassing NetworkInstruction FetchInstruction DecodeBypassing NetworkFU-1FU-2FU-3FU-4FU-5FU-1FU-2FU-3FU-4FU-5RegisterFileRegisterFileTTASlides Adapted/From: J.Takala/TUTJ.Takala/TUT Berkeley – Finland Day, Oct.18, 2002TTA DatapathTTA DatapathIntegerALUIntegerALUFloatALUBoolean RFFloat RFInteger RFSocketInstruction MemoryData MemoryLoad/StoreUnitLoad/StoreUnitImmediate UnitInstruction UnitJ.Takala/TUT Berkeley – Finland Day, Oct.18, 2002Function UnitsFunction Units Operands written to operand registers (O) Operation performed when last operand written to trigger register (T) Pipeline synchronized with control bits (C) Standard interface FU_ready Result_ready Global_lockToptionalOptional shadow registerOlogiclogicRlogicCCCCJ.Takala/TUT Berkeley – Finland Day, Oct.18, 2002ILP ILP ArchitecturesArchitecturesFrontendFrontendApplicationsequential(superscalar)dependence(dataflow)independence(EPIC)independence(VLIW)Compilation timeindependence(TTA)Determine DependenciesDetermine DependenciesDetermine IndependenciesDetermine IndependenciesBind Function UnitsBind Function UnitsBind DatapathsBind DatapathsExecuteExecuteDetermine DependenciesDetermine DependenciesDetermine IndependenciesDetermine IndependenciesBind Function UnitsBind Function UnitsBind DatapathsBind DatapathsRun timeJ.Takala/TUT Berkeley – Finland Day, Oct.18, 2002TTA Characteristics: HWTTA Characteristics: HWModularCan be constructed with standard building blocksVery flexible and scalableFU functionality can be arbitrarySupports user defined Special Function Units (SFU)Lower complexityReduction on # register portsReduced bypass complexityReduction in bypass connectivityReduced register pressureTrivial decoding (implies long instructions)J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002TTA Characteristics: SWTTA Characteristics: SWTraditional operation-triggered instruction:Transport-triggered instruction:Reminds dataflow and time-stationary codingmul r1,r2,r3;r1→mul.o; r2→mul.t; mul.r→r3;r1→mul.o, r2→mul.t; mul.r→r3;orJ.Takala/TUT Berkeley – Finland Day, Oct.18, 2002TTA Specific OptimizationsTTA Specific OptimizationsTTA allows extra scheduling optimizationsE.g., software bypassingBypassing can eliminate the need of RF accessHowever, more difficult to schedule !Example: r1 → add.o, r2 → add.t;add.r → r3;r3 → sub.o, r4 → sub.tsub.r → r5;Translates to: r1 → add.o, r2 → add.t;add.r → sub.o, r4 → sub.t;sub.r → r5;15-745 © Seth Copen Goldstein 2005-7 15Registers aren’t everything•TRIPS– operand-based dataflow architecture• Wavescalar– (operand-based?) dataflow architecture– Makes memory dependencies explicit• Pegasus– dataflow, operand/wires explicit– Memory dependencies explicit• All Three– basic unit is a hyperblock15-745 © Seth Copen Goldstein 2005-7 16TRIPS15-745 © Seth Copen Goldstein 2005-7 17TRIPS: Program Representation15-745 © Seth Copen Goldstein 2005-7 18TRIPS: Compiling15-745 © Seth Copen Goldstein 2005-7 19Wavescalar15-745 © Seth Copen Goldstein 2005-7 20Wavescalar: Memory Dependencies15-745 © Seth Copen Goldstein 2005-7 21SP on TTA• Extends LAM’s modular scheduling to TTA•Recall: – d(u,v):
View Full Document