CprE / ComS 583 Reconfigurable ComputingQuick PointsAllowable SchedulesSequentializationMulticontext SchedulingSignal RetimingSlide 7Full ASCII Hex CircuitMulticontext VersionASCIIHex ExampleASCIIHex Example (cont.)General Throughput MappingBenchmark SetArea v. ThroughputArea v. Throughput (cont.)Reconfiguration for Fault ToleranceColumn Based ReconfigurationSlide 18Slide 19SummaryOutlineCoarse-grained ArchitecturesDP-FPGAConfiguration SharingTwo-dimensional LayoutDP-FPGA Technology MappingRaPiDRaPiD DatapathRaPiD Control PathFIR Filter ExampleFIR Filter Example (cont.)MATRIXBasic Functional UnitMATRIX InterconnectFunctional Unit InputsSlide 36ChessChess InterconnectChess Basic BlockReconfigurable Architecture WorkstationRAW TileRAW DatapathRaw CompilerSlide 44CprE / ComS 583Reconfigurable ComputingProf. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State UniversityLecture #23 – Function Unit ArchitecturesCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.2Quick Points•HW #3, #4 graded and returned•Next week Thursday, project status updates•10 minute presentations per group + questions•Upload to WebCT by the previous evening•Expected that you’ve made some progress!CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.3Allowable SchedulesActive LUTs (NA) = 3CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.4Sequentialization•Adding time slots •More sequential (more latency)•Adds slack•Allows better balanceL=4 NA=2 (4 or 3 contexts)CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.5Multicontext Scheduling•“Retiming” for multicontext•goal: minimize peak resource requirements•NP-complete•List schedule, anneal•How do we accommodate intermediate data?•Effects?CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.6Signal Retiming•Non-pipelined •hold value on LUT Output (wire) •from production through consumption•Wastes wire and switches by occupying•For entire critical path delay L•Not just for 1/L’th of cycle takes to cross wire segment•How will it show up in multicontext?CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.7Signal Retiming•Multicontext equivalent•Need LUT to hold value for each intermediate contextCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.8•Logically three levels of dependence•Single Context: 21 LUTs @ 880K2=18.5M2Full ASCII Hex CircuitCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.9•Three contexts: 12 LUTs @ 1040K2=12.5M2•Pipelining needed for dependent pathsMulticontext VersionCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.10ASCIIHex Example •All retiming on wires (active outputs)•Saturation based on inputs to largest stage•With enough contexts only one LUT needed•Increased LUT area due to additional stored configuration information•Eventually additional interconnect savings taken up by LUT configuration overheadIdealPerfect scheduling spread + no retime overheadCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.11@ depth=4, c=6: 5.5M2 (compare 18.5M2 )ASCIIHex Example (cont.)CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.12General Throughput Mapping•If only want to achieve limited throughput•Target produce new result every t cycles•Spatially pipeline every t stages •cycle = t •Retime to minimize register requirements•Multicontext evaluation w/in a spatial stage•Retime (list schedule) to minimize resource usage •Map for depth (i) and contexts (c)CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.13•23 MCNC circuits•Area mapped with SIS and ChortleBenchmark SetCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.14Area v. ThroughputCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.15Area v. Throughput (cont.)CprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.16Reconfiguration for Fault Tolerance•Embedded systems require high reliability in the presence of transient or permanent faults•FPGAs contain substantial redundancy •Possible to dynamically “configure around” problem areas•Numerous on-line and off-line solutionsCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.17•Huang and McCluskey•Assume that each FPGA column is equivalent in terms of logic and routing•Preserve empty columns for future use•Somewhat wasteful•Precompile and compress differences in bitstreamsColumn Based ReconfigurationCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.18•Create multiple copies of the same design with different unused columns•Only requires different inter-block connections•Can lead to unreasonable configuration countColumn Based ReconfigurationCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.19•Determining differences and compressing the results leads to “reasonable” overhead•Scalability and fault diagnosis are issuesColumn Based ReconfigurationCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.20Summary•In many cases cannot profitably reuse logic at device cycle rate•Cycles, no data parallelism•Low throughput, unstructured•Dissimilar data dependent computations•These cases benefit from having more than one instructions/operations per active element•Economical retiming becomes important here to achieve active LUT reduction•For c=[4,8], I=[4,6] automatically mapped designs are 1/2 to 1/3 single context sizeCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.21Outline•Continuation•Function Unit Architectures•Motivation•Various architectures•Device trendsCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.22Coarse-grained Architectures•DP-FPGA •LUT-based •LUTs share configuration bits•Rapid•Specialized ALUs, mutlipliers•1D pipeline•Matrix•2-D array of ALUs•Chess•Augmented, pipelined matrix•Raw•Full RISC core as basic block•Static scheduling used for communicationCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.23DP-FPGA•Break FPGA into datapath and control sections•Save storage for LUTs and connection transistors•Key issue is grain size•Cherepacha/Lewis – U. TorontoCprE 583 – Reconfigurable ComputingNovember 9, 2006 Lect-23.24MC = LUT SRAM bitsCE = connection block pass transistorsCENMCNCE*NMCA(N) Set MC = 2-3CE01 1 100 1 0Y0Y1A0B0C0A1B1C1Configuration
View Full Document