1CprE / ComS 583Reconfigurable ComputingProf. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State UniversityLecture #24 – Reconfigurable CoprocessorsCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.2• Unresolved course issues• Gigantic red bug• Ghost inside Microsoft PowerPoint• This Thursday, project status updates• 10 minute presentations per group + questions• Combination of Adobe Breeze and calling in to teleconference• More details later todayQuick PointsCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.3Recap – DP-FPGA• Break FPGA into datapath and control sections• Save storage for LUTs and connection transistors• Key issue is grain size• Cherepacha/Lewis – U. TorontoCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.4• Segmented linear architecture• All RAMs and ALUs are pipelined• Bus connectors also contain registersRecap – RaPiDCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.5Recap – Matrix• Two inputs from adjacent blocks• Local memory for instructions, dataCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.6Recap – RAW Tile• Full functionality in each tile• Static router located for near-neighbor communication2CprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.7Outline• Recap• Reconfigurable Coprocessors• Motivation• Compute Models• Architecture• ExamplesCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.8• Processors efficient at sequential codes, regular arithmetic operations• FPGA efficient at fine-grained parallelism, unusual bit-level operations• Tight-coupling important: allows sharing of data/control• Efficiency is an issue:• Context-switches• Memory coherency• SynchronizationOverviewCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.9• I/O pre/post processing• Application specific operation• Reconfigurable Co-processors• Coarse-grained• Mostly independent• Reconfigurable Functional Unit• Tightly integrated with processor pipeline• Register file sharing becomes an issueCompute ModelsCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.10a31 a30………. a0b31 b0Swap bitpositionsInstruction Augmentation• Processor can only describe a small number of basic computations in a cycle • I bits -> 2Ioperations• Many operations could be performed on 2 W-bit words• ALU implementations restrict execution of some simple operations• e. g. bit reversalCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.11Instruction Augmentation (cont.)• Provide a way to augment the processor instruction set for an application• Avoid mismatch between hardware/software• Fit augmented instructions into data andcontrol stream• Create a functional unit for augmented instructions• Compiler techniques to identify/use new functional unitCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.12“First” Instruction Augmentation• PRISM• Processor Reconfiguration through Instruction Set Metamorphosis• PRISM-I• 68010 (10MHz) + XC3090• can reconfigure FPGA in one second!• 50-75 clocks for operations3CprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.13PRISM-1 ResultsCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.14PRISM Architecture• FPGA on bus• Access as memory mapped peripheral• Explicit context management• Some software discipline for use• …not much of an “architecture” presented to userCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.15PRISC• Architecture:• couple into register file as “superscalar”functional unit• flow-through array (no state)CprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.16PRISC (cont.)• All compiled• Working from MIPS binary• <200 4LUTs ?• 64x3• 200MHz MIPS base• See [RazSmi94A] for more detailsCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.17Chimaera• Start from Prisc idea.• Integrate as a functional unit• No state• RFU Ops (like expfu)• Stall processor on instruction miss• Add• Multiple instructions at a time• More than 2 inputs possible• [HauFry97A]CprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.18Chimaera Architecture• Live copy of register file values feed into array• Each row of array may compute from register of intermediates• Tag on array to indicate RFUOP4CprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.19Chimaera Architecture (cont.)• Array can operate on values as soon as placed in register file• When RFUOP matches• Stall until result ready• Drive result from matching rowCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.20Chimaera Timing• If R1 presented late then stall• Might be helped by instruction reordering• Physical implementation an issue• Relies on considerable processor interaction for supportCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.21Chimaera Speedup• Three Spec92 benchmarks• Compress 1.11 speedup• Eqntott 1.8• Life 2.06• Small arrays with limited state• Small speedup• Perhaps focus on global router rather than local optimizationCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.22Garp• Integrate as coprocessor• Similar bandwidth to processor as functional unit• Own access to memory• Support multi-cycle operation• Allow state• Cycle counter to track operation• Configuration cache, path to memoryCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.23Garp (cont.)• ISA – coprocessor operations•Issue gaconfig to make particular configuration present• Explicitly move data to/from array• Processor suspension during coproc operation • Use cycle counter to track progress• Array may directly access memory•Processor and array share memory• Exploits streaming data operations• Cache/MMU maintains data consistencyCprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.24Garp Instructions• Interlock indicates if processor waits for array to count to zero• Last three instructions useful for context swap• Processor decode hardware augmented to recognize new instructions5CprE 583 – Reconfigurable ComputingNovember 14, 2006 Lect-24.25Garp Array• Row-oriented logic• Dedicated path for processor/memory• Processor does not have to be involved in array-memory pathCprE
View Full Document