DOC PREVIEW
ISU CPRE 583 - Teramac-Configurable Custom Computing

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Teramac-Configurable Custom Computing Rick Amerson, Richard J. Carter, W. Bruce Culbertson, Phil Kuekes, Greg Snider Hewlett-Packard Laboratories 1501 Page Mill Road, Palo Alto CA 94304 Abstract-The Teramac configurable hardware system can execute synchronous logic designs of up to one mil- lion gates at rates up to l megahertz. A fully configured Teramac includes half a gigabyte of RAM and hardware support for large multiported register files. The system has been built from custom FPGA's packaged in large multichip modules (MCMs). A large custom circuit (-1,000,000 gates) may be compiled onto the hardware in approximately 2 hours, without user intervention. The system is being used to explore the potential of custom computing machinery (CCM). 1 Teramac System Overview Research on special purpose parallel architectures and custom computing is very much an experimental sci- ence dependent on the existence of prototypes. We have built an FPGA-based configurable custom computing engine to enable experiments on an interesting scale. Teramac is a configurable hardware system comprising 1728 custom FPGAs and .5 gigabytes of RAM. It fea- tures: We are currently conducting experiments with an 8 board Teramac system. 2 Hardware The Teramac system logically consists of four major components as shown in figure 1 : 4 t programmable 1 hardware FIGURE 1. System Block Diagram (1) programmable hardware, which is configured to functionally reproduce a user's circuit; (2) RAM, which may be incorporated into user designs requiring mem- ory; (3) a controller, which is responsible for controlling the programmable hardware as well as exchanging con- figuration and state data with an external host; and (4) a host workstation which provides the center of control- user interface, compiler, and debug environment. The host connects to Teramac with a set of SCSI buses, mak- ing it easy to upgrade the host without modifying the Teramac hardware. The Teramac system is shown in fig- ure 2. "c's programmable hardware implementation is uniform: sixteen identical PC boards are interconnected with cables; each board carries four identical multichip modules (MCMs); each MCM carries 27 identical 1,000,000 gate capacity for synchronous logic cir- cuits. up to 1 MHz clock rate. .5 Gbytes of memory organized into 64 independent, 32-bit-wide banks, each with independent read and write ports. Banks may be combined horizontally and vertically to form large memories. Fully automatic compilation. Checkpoint restart capability. Scalability. A minimum Teramac system (a single board) supports designs of up to 64K gates. Addi- tional boards may be added to expand the capacity incrementally, up to maximum of 16 boards. 0-8186-7086-X/95 $04.00 0 1995 IEEE 32Figure 2. Teramac Hardware FPGAs. Thus, a fully configured Teramac contains 1,728 FPGAs. 2.1 PLASMA-custom FPGA We investigated using standard FPGA's for Teramac, but we ultimately designed our own-the PLASMA' chip. PLASMA is a routing-rich FPGA that consists of 6-input, 2-output lookup tables (with configurable latches and registers on their outputs), interconnected by partially populated crossbar switches. We chose a cus- tom approach for several reasons: Compilation Time: Placement and routing time for standard FPGA's is still much longer than is accept- able for a custom computing machine (CCM). Although the place and route times for a single chip 1. PLASMA: Programmable Logic And Switch MAtrix 33 Figure 3. PLASMA FPGAciently implemented by capitalizing on the structure of the lookup tables in our logic cells-the lookup table decoders could be reconfigured to implement read and write ports; with the addition of some regis- ter bits, we are able to configure some of the logic cells to behave as a multiported register file slice. The decode logic for a register file would have been expensive to implement in standard FPGA's. Proprietary Conjiguration Fomuzts: The vendors we considered for supplying the PGA's had proprietary formats for internal data they were unwilling to dis- close to a research project with limited volume. Thus we would have been required to use their software tools to develop a design; our users would have needed access to the same tools. 2.2 MCM Design The Teramac MCM, a very large (6.13 by 7.4 inches) MCM-C, has 27 chips, each with 408 pads [l]. The problem we faced was one of wiring complexity in a system which would contain hundreds of 408 pad ICs. The total number of wires was such that we had to very carefully balance the costs of the MCM level of connec- tion, the PCB level, and board to board connections. By using MCMs for the vast majority of the wires we were able to relieve the pressure on the PCBs and the board to board connections. Figure 4. MCM with 27 PLASMA chips The advantage to a very large MCM comes from Rent's Rule [2], 110 wires = constant * average-pinout * chips OS By putting more chips on the MCM, a large of the total wires are removed from the PCBs, wh intrinsically have less capacity, being limited to un 20 layers. By using an MCM C we reduced of pins to be connected on the PCB from 11 ,O 16 to 3,264, a factor of 3.375 improvement compared to sin- gle chip modules (SCM's). Had we attempted to use SCM's, the area required just to mount the 27 chips on the board would have been over 5 times the area of the 27 chip MCM, assuming the board could have been routed. The 27 chip MCM is shown in figure 4. Even though much effort went into minimizing the number of layers and vias (a major cost component), 39 layers were required (12 layers for power planes and signal spread- ing, 27 layers for signal routing) with 260,000 blind vias buried in the MCM. The wire length in the vias alone is over 2000 inches. 2.3 LogicBoards Each board contains four MCMs which implement a simple network of 108 identical chips. Up to 16 boards may be interconnected via cables. Each board has 4032 signal YO pins through cables to other boards. 2.4 Controller Boards Each logic board connects to a daughter controller board containing four banks of 32-bit wide, 2 M deep, 2-port static RAM, and control circuitry for interfacing the board to the host computer. The RAM banks are con- nected into the PLASMA network to support user designs containing embedded memory. The controller circuitry: (1) relays configuration data from the host, (2) transfers state data between the memories, PLASMAS, and the host, and (3) controls clocking and breakpoint- ing of


View Full Document

ISU CPRE 583 - Teramac-Configurable Custom Computing

Download Teramac-Configurable Custom Computing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Teramac-Configurable Custom Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Teramac-Configurable Custom Computing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?