DOC PREVIEW
ISU CPRE 583 - Lecture 25

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CprE / ComS 583Reconfigurable ComputingProf. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State UniversityLecture #25 – High-Level CompilationCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.2Quick Points26SundayDead Week3Finals Week101726Monday41118Lect-2528TuesdayProject Seminars (EDE)1512Electronic Grades Due1929Wednesday613Lect-26??30ThursdayProject Seminars (Others)7141Friday8152Saturday9Project Write-ups Deadline16December / November 2006CprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.3Project Deliverables• Final presentation [15-25 min]• Aim for 80-100% project completeness• Outline it as an extension of your report:• Motivation and related work• Analysis and approach taken• Experimental results and summary of findings• Conclusions / next steps• Consider details that will be interesting / relevant for the expected audience• Final report [8-12 pages] • More thorough analysis of related work• Minimal focus on project goals and organization• Implementation details and results• See proceedings of FCCM/FPGA/FPL for inspirationCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.4• Processors efficient at sequential codes, regular arithmetic operations• FPGA efficient at fine-grained parallelism, unusual bit-level operations• Tight-coupling important: allows sharing of data/control• Efficiency is an issue:• Context-switches• Memory coherency• SynchronizationRecap – Reconfigurable CoprocessingCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.5a31 a30………. a0b31 b0Swap bitpositionsInstruction Augmentation• Processor can only describe a small number of basic computations in a cycle • I bits -> 2Ioperations• Many operations could be performed on 2 W-bit words• ALU implementations restrict execution of some simple operations• e. g. bit reversalCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.6Recap – PRISC [RazSmi94A] • Architecture:• couple into register file as “superscalar”functional unit• flow-through array (no state)2CprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.7Recap – Chimaera Architecture• Live copy of register file values feed into array• Each row of array may compute from register of intermediates• Tag on array to indicate RFUOPCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.8PipeRench Architecture• Many application are primarily linear • Audio processing• Modified video processing• Filtering• Consider a “striped” architecture which can be very heavily pipelined• Each stripe contains LUTs and flip flops• Datapath is bit-sliced• Similar to Garp/Chimaera but standalone• Compiler initially converts dataflow application into a series of stripes• Run-time dynamic reconfiguration of stripes if application is too big to fit in available hardwareCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.9PipeRench Internals• Only multi-bit functional units used• Very limited resources for interconnect to neighboring programming elements• Place and route greatly simplifiedCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.10F1F2F3F4F5F1F6F3F4F5D1D2D3D4PipeRench Place-and-Route• Since no loops and linear data flow used, first step is to perform topological sort• Attempt to minimize critical paths by limiting NO-OP steps• If too many trips needed, temporally as well as spatially pipelineCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.11CUSTOM:PipeRench FabricSTANDARD CELLS:Virtualization & Interface LogicConfiguration CacheData Store MemorySTRIPEPE• 3.6M transistors• Implemented in a commercial 0.18μ, 6 metal layer technology• 125 MHz core speed(limited by control logic)• 66 MHz I/O Speed• 1.5V core, 3.3V I/OPipeRench PrototypesCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.12Parallel Computation• What would it take to let the processor and FPGA run in parallel?Modern ProcessorsDeal with:• Variable data delays• Dependencies with data• Multiple heterogeneous functional unitsVia:• Register scoreboarding• Runtime data flow (Tomasulo)3CprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.13OneChip• Want array to have direct memory→memoryoperations• Want to fit into programming model/ISA• Without forcing exclusive processor/FPGA operation• Allowing decoupled processor/array execution• Key Idea:• FPGA operates on memory→memory regions• Make regions explicit to processor issue• Scoreboard memory blocksCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.14OneChip PipelineCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.15• Basic Operation is:• FPGA MEM[Rsource]→MEM[Rdst]• block sizes powers of 2• Supports 14 “loaded” functions• DPGA/contexts so 4 can be cached• Fits well into soft-core processor modelOneChip InstructionsCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.16OneChip (cont.)• Basic op is: FPGA MEM→MEM• No state between these ops• Coherence is that ops appear sequential• Could have multiple/parallel FPGA compute units• Scoreboard with processor and each other• Single source operations?• Can’t chain FPGA operations? CprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.170x00x10000x10000FPGAProcIndicates usage of data pages likevirtual memory system!OneChip Extensions• FPGA operates on certain memory regions only• Makes regions explicit to processor issue• Scoreboard memory blocksCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.18Shadow Registers• Reconfigurable functional units require tight integration with register file• Many reconfigurable operations require more than two operands at a time4CprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.19Multi-Operand Operations• What’s the best speedup that could be achieved?• Provides upper bound• Assumes all operands available when neededCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.20Additional Register File Access• Dedicated link – move data as needed• Requires latency• Extra register port –consumes resources• May not be used often• Replicate whole (or most) of register file• Can be wastefulCprE 583 – Reconfigurable ComputingNovember 28, 2006 Lect-25.21Shadow Register Approach• Small number of registers needed (3 or 4)• Use extra bits


View Full Document

ISU CPRE 583 - Lecture 25

Download Lecture 25
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 25 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 25 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?