DOC PREVIEW
Berkeley COMPSCI 150 - HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

HSRA:High-Speed, Hierarchical Synchronous Reconfigurable ArrayWilliam Tsu, Kip Macy, Atul Joshi, Randy HuangNorman Walker, Tony Tung, Omid Rowhani, Varghese GeorgeJohn Wawrzynek, and Andr´e DeHonBerkeley Reconfigurable, Architectures, Software, and SystemsComputer Science DivisionUniversity of California at BerkeleyBerkeley, CA 94720-1776contact: <[email protected]>There isnoinherentcharacteristicforcing FieldProgrammableGateArray (FPGA) or Reconfigurable Computing (RC) Array cycletimes to be greater than processors in the same process. Mod-ern FPGAs seldom achieve application clock rates close to theirprocessor cousins because (1) resources in the FPGAs are not bal-anced appropriately for high-speed operation, (2) FPGA CAD doesnot automatically provide the requisite transforms to support thisoperation, and (3) interconnect delays can be large and vary almostcontinuously,complicating high frequency mapping. We introducea novel reconfigurable computing array, theHigh-Speed, Hierarchi-cal Synchronous Reconfigurable Array (HSRA), and its supportingtools. Thispackagedemonstrates thatcomputingarrays canachieveefficient, high-speedoperation. We have designed andimplementeda prototype component in a 0.4 m logic design on a DRAM processwhich will support 250MHz operation for CAD mapped designs.A common myth about FPGAs is that they are inherently 10slowerthan processors. We see no physicallimitations whichwouldmake this true, but there are some good reasons why this mythpersists.Looking at raw cycle times, we see that the potential operatingfrequencies for FPGAs are comparable to processors in the sameprocess (See Table 1). The cycle time on a processor representsthe minimum interval at which a new operation on new data can beinitiated or completed. That is, it defines how fast we can clock thecomputational and memory units and reuse them to perform subse-quent operations. Since traditional FPGAs are not synchronous, itis not as obvious what the native cycle time is for an FPGA. How-ever, if we also take the FPGA cycle time as the minimum intervalat which we can launch a new datum for computation, then we canidentify a cycle time. For example, the XC4000XL-09 family hasa logic evaluation to clock setup time of 1.6 ns, and a clock-to-Qtime of 1.5 ns. If we take the minimum clock low and high-times of2.3 ns each, we can define a cycle of 4.6 ns which leaves (4.6-1.5-1.6)=1.5 ns for interconnect on each cycle. Similarly, Von Herzendefined a 4 ns cycle on XC3100-09 and designed his signal pro-cessing applications to this cycle time [11]. In Table 1, we see thatTo appear in the Seventh International Symposium on Field-Programmable Gate Arrays, February 21–23, Monterey, CA.these cycle times are within a factor of two of processors in thesame process.Inpractice, however,theapplications we seerunningat200MHz+on theseFPGAs are fewand farbetween. While the basiccycle timefor an FPGA is small, most contemporary FPGA designs run muchslower—more typically in the 25-70MHz range. Why do designsrun this much slower than the conceivable peak? We conjecturethere are several factors which contribute to the low frequency ofmost FPGA designs:1. no reason to run faster — Often the limited speed is all theuser wants orneeds,andthere is no application reasontorunata higher cycle rate. For example, if the application is samplerate limited at a modest sample rate, there is no requirementto process data at a higher rate. Furthermore, when datarates are limited by system components outside of the FPGAor standards, the application may have no cause to run at afaster rate. However, whensuch externalor application limitsappear, it is often possible to reduce the hardware required byrunning a more serialized design in less space (fewer gates,smaller FPGA component) at the higher cycle rate achievableby the FPGA.2. cyclic data dependencies limit pipelineability – Cycles inthe flow graph define a minimum clock cycle time. We can-not pipeline down to the LUT level within such cycles. Wecan, however, run the design -slow [14] at the LUT-cyclerate, allowing us to solve -independent problems simulta-neously in the hardware space. If we do not have a numberof independent problems to solve, we can reuse gates andinterconnect at the LUT-cycle rate to solve the problem inless area when the device has multiple contexts (e.g. DPGA[6]).3. inadequate tool support – Reorganizing a design to run atthis tight cycle rate can be a tedious task. While the basictechnology is known in the design automation world, typicalFPGA tools and design flows do not provide support foraggressive retiming. In part this results from the traditionalglue-logic replacement philosophy which lets the user definethe base cycle and what has to happen within a cycle, ratherthan taking a computational view which says that the userdefines a taskand thetoolsarefreeto transformtheproblemasnecessaryto map the user’s task onto the computingplatform.4. interconnect delays dominate – Interconnectdelays dependon the distance between source and sink and can easily dom-inate all other delays. We were only able to define the tightcycle times we did above by assuming very local communi-cations. If we allowed even one cross chip delay time in the1Design Feature Cycle ReferenceXC4000XL-09 0.35 m 4.6 ns [20]A10K100A-1 0.35 m 5.0 ns [1]Strong Arm 0.35 m 5.0 ns [15]Alpha 0.35 m 2.3 ns [10]SPARC 0.35 m 3.0 ns [9]Pentium 0.35 m 3.3 ns [4]Alpha 0.35 m 1.7 ns [7]HSRA 0.40 m 4.0 ns5.0 ns cycle based onmin min2 5 nsTable 1: Cycle Rate comparison at 0.35 mNumber of Registers 1 2 3 4 5 6 7 8 9 10Percentage 72 16 4.5 2.2 1.3 0.96 1.2 0.46 0.12 0.11Table 2: Benchmark-Wide Distribution of Registers Required between LUTscycle, the cycle time would increase significantly. This leadsus to believe that eitherwe have to accept a much larger cycletime, or we must limit all communications to local connec-tions, as in [11]. As long as we must traverse an entire longinterconnect line in a single cycle, we are left where we canonly achievethe tightcycle forverystylizedproblems or withheroic personal effort to design and layout the computationentirely using local connections.5. pipelining becomes expensive – In order to pipeline thedevice heavily enough to run at this cycle rate, the designneedsa larger number offlip-flopsforproper retiming. Whileflip-flops are “relatively” cheap in many FPGAs, the typicalbalance is roughly one flip-flop per 4-LUT. However, for afully pipelined


View Full Document

Berkeley COMPSCI 150 - HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array

Documents in this Course
Lab 2

Lab 2

9 pages

Debugging

Debugging

28 pages

Lab 1

Lab 1

15 pages

Memory

Memory

13 pages

Lecture 7

Lecture 7

11 pages

SPDIF

SPDIF

18 pages

Memory

Memory

27 pages

Exam III

Exam III

15 pages

Quiz

Quiz

6 pages

Problem

Problem

3 pages

Memory

Memory

26 pages

Lab 1

Lab 1

9 pages

Memory

Memory

5 pages

Load more
Download HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?