DOC PREVIEW
ISU CPRE 583 - Lect-22

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CprE / ComS 583Reconfigurable ComputingProf. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State UniversityLecture #22 – Multi-Context FPGAsCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.2process (a, b, c)in port a, b;out port c;{read(a);…write(c);}SpecificationLine (){a = ……detach}ProcessorCaptureModelFPGAPartitionSynthesizeInterfaceRecap – HW/SW Partitioning• Good partitioning mechanism:1)Minimize communication across bus2)Allows parallelism Æ both hardware (FPGA) and processor operating concurrently3)Near peak processor utilization at all times (performing useful work)CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.3Recap – Communication and Control• Need to signal between CPU and accelerator• Data ready• Complete• Implementations:• Shared memory• FIFO• Handshake• If computation time is very predictable, a simpler communication scheme may be possibleCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.4Informal Specification,ConstraintsSystem modelArchitecture designHW/SW implementationPrototypeTestImplementationFailSuccessComponentprofilingPerformanceevaluationRecap – System-Level MethodologyCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.5Outline• Recap• Multicontext• Motivation• Cost analysis• Hardware support• ExamplesCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.6Single Context• When we have• Cycles and no data parallelism• Low throughput, unstructured tasks• Dissimilar data dependent tasks• Active resources sit idle most of the time• Waste of resources• Why?2CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.7Single Context: Why?• Cannot reuse resources to perform differentfunctions, only the same functionsCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.8Multiple-Context LUT• Configuration selects operation of computation unit• Context identifier changes over time to allow change in functionality• DPGA – Dynamically Programmable Gate ArrayCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.9ABF0F1F2Non-pipelined exampleComputations that Benefit• Low throughput tasks• Data dependent operations• Effective if not all resources active simultaneously• Possible to time-multiplex both logic and routing resourcesCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.10Computations that Benefit (cont.)CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.11Computations that Benefit (cont.)CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.12Resource Reuse• Resources must be directed to do different things at different times through instructions• Different local configurations can be thought of as instructions• Minimizing the number and size of instructions a key to successfully achieving efficient design• What are the implications for the hardware?3CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.13Example: ASCII – Binary Conversion• Input: ASCII Hex character• Output: Binary valuesignal input : std_logic_vector(7 downto 0);signal output : std_logic_vector(3 downto 0);process (input)beginend process;CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.14ASCII – Binary Conversion CircuitCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.15Implementation #1 Implementation #2NA= 3 NA= 4Implementation Choices• Both require same amount of execution time• Implementation #1 more resource efficientCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.16InterconnectMuxLogic Reuse•Actxt≈80Kλ2• dense encoding•Abase≈800Kλ2* Each context not overly costly compared to base cost of wire, switches, IO circuitryQuestion: How does this effect scale? Previous Study [Deh96B]CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.17DPGA PrototypeCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.18• Assume ideal packing: Nactive= Ntotal/L• Reminder: c*Actxt= Abase • Difficult to exactly balance resources/demands• Needs for contexts may vary across applications• Robust point where critical path length equals # contextsMulticontext Tradeoff Curves4CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.19In Practice• Scheduling limitations• Retiming limitationsCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.20Scheduling Limitations• NA(active)• Size of largest stage• Precedence: • Can evaluate a LUT only after predecessors have been evaluated• Cannot always, completely equalize stage requirementsCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.21Scheduling• Precedence limits packing freedom• Freedom shows up as slack in networkCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.22Scheduling• Computing Slack:• ASAP (As Soon As Possible) Schedule• propagate depth forward from primary inputs• depth = 1 + max input depth• ALAP (As Late As Possible) Schedule• propagate distance from outputs back from outputs• level = 1 + max output consumption level• Slack• slack = L+1-(depth+level) [PI depth=0, PO level=0]CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.23Allowable SchedulesActive LUTs (NA) = 3CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.24Sequentialization• Adding time slots • More sequential (more latency)• Adds slack• Allows better balanceL=4 →NA=2 (4 or 3 contexts)5CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.25Multicontext Scheduling• “Retiming” for multicontext• goal: minimize peak resource requirements• NP-complete• List schedule, anneal• How do we accommodate intermediate data?• Effects?CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.26Signal Retiming• Non-pipelined • hold value on LUT Output (wire) • from production through consumption• Wastes wire and switches by occupying• For entire critical path delay L• Not just for 1/L’th of cycle takes to cross wire segment• How will it show up in multicontext?CprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.27Signal Retiming• Multicontext equivalent• Need LUT to hold value for each intermediate contextCprE 583 – Reconfigurable ComputingNovember 7, 2006 Lect-22.28• Logically three levels of dependence• Single Context: 21 LUTs @ 880Kλ2=18.5Mλ2Full ASCII Æ Hex CircuitCprE 583 – Reconfigurable ComputingNovember 7, 2006


View Full Document

ISU CPRE 583 - Lect-22

Download Lect-22
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lect-22 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lect-22 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?