DOC PREVIEW
CMU CS 15740 - A High-Performance Microarchitecture with Hardware-Programmable Functional Units

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Published in Proc. of MICRO-27, November 19941A High-Performance Microarchitecturewith Hardware-Programmable Functional UnitsRahul Razdan*+ and Michael D. Smith**Harvard University, Cambridge, MA 02138+Digital Equipment Corporation, Hudson, MA 01742AbstractThis paper explores a novel way to incorporate hardware-program-mable resources into a processor microarchitecture to improve theperformance of general-purpose applications. Through a couplingof compile-time analysis routines and hardware synthesis tools, weautomatically configure a given set of the hardware-programmablefunctional units (PFUs) and thus augment the base instruction setarchitecture so that it better meets the instruction set needs of eachapplication. We refer to this new class of general-purpose comput-ers as PRogrammable Instruction Set Computers (PRISC).Although similar in concept, the PRISC approach differs fromdynamically programmable microcode because in PRISC wedefine entirely-new primitive datapath operations. In this paper, weconcentrate on the microarchitectural design of the simplest formof PRISC—a RISC microprocessor with a single PFU that onlyevaluates combinational functions. We briefly discuss the operat-ing system and the programming language compilation techniquesthat are needed to successfully build PRISC and, we present per-formance results from a proof-of-concept study. With the inclusionof a single 32-bit-wide PFU whose hardware cost is less than thatof a 1 kilobyte SRAM, our study shows a 22% improvement inprocessor performance on the SPECint92 benchmarks.Keywords: programmable logic, general-purpose microarchitec-tures, automatic instruction set design, compile-time optimization,logic synthesis1 IntroductionA number of studies have shown that the use of hardware-pro-grammable logic, such as FPGAs, can improve application perfor-mance by tailoring hardware paths to match the particularcharacteristics of the individual application [4,5,6,17]. Overall, thearchitectures in these studies only work well for special-purposedomains such as logic simulation and large number multiplication.To effectively use hardware-programmable resources in general-purpose environment, we must develop a new approach that iscost-effective, automatic, and applicable to the vast majority ofapplications.Our architectural approach to achieve these goals is called PRo-grammable Instruction Set Computers (PRISC). To be cost effec-tive, we implement PRISC on top of an existing high-performanceprocessor microarchitecture. For this paper, we use a RISC archi-tecture as our base, though our PRISC techniques are equallyapplicable to a CISC architecture. PRISC augments the conven-tional set of RISC instructions with application-specific instruc-tions that are implemented in hardware-programmable functionalunits (PFUs). These PFUs are carefully added to the microarchi-tecture so we maintain the benefits of high-performance RISCtechniques (e.g. fixed instruction formats) and we minimallyimpact the processor’s cycle time.To generate these application-specific PFU instructions in an auto-mated fashion, we have developed compilation routines that ana-lyze the hardware complexity of individual instructions. Using thisinformation, the compiler interacts with sophisticated logic synthe-sis programs to select sequences of instructions that will executefaster if implemented in PFU hardware. Since the PFU instructiongeneration process is driven by the specific computations found ineach application, our PRISC approach avoids the semantics gapproblems of CISC architectures [14]. Furthermore, the complexityof our approach is completely hidden from the user/programmer.The most general computational model for a PFU is a multi-cyclesequential state machine. Iterative hardware solutions for square-root or transcendental function evaluation are good examples ofthis class of PFU. The general model however introduces synchro-nization complexities between the PFU and the other RISC func-tional units. For this paper, we discuss a simpler model thatimplements a combinational function of two inputs and one output.The synthesis routines constrain the complexity of this combina-tional function so that its delay is equal to the delay of the ALUalready in the processor datapath. With these two restrictions, aPFU can use the same synchronization mechanisms as the otherRISC functional units. We refer to this first implementation of thePRISC architecture as PRISC-1.PRISC-1 was originally meant as a proof-of-concept vehicle thatwould allow us to develop the basic PRISC compilation and syn-thesis environment. To our surprise, the PRISC-1 microarchitec-ture exhibited noticeable performance benefits not only on thecomputer-aided design (CAD) applications in the SPECint92benchmark suite, but on the other applications as well. Eventhough a PFU is significantly slower than a highly-customizedRISC functional unit, we can automatically find opportunities touse a PFU where a typical custom functional unit is not adequate.Our PRISC environment makes the MIPS 1% rule work on a perapplication basis [18].The next section summarizes some work related to the use of pro-grammable logic in processor design and the automatic generationof instruction sets. Section 3 describes the microarchitecture ofPRISC-1, while Section 4 overviews our PRISC compilation envi-ronment and hardware extraction techniques. Section 5 discussesour performance modeling environment and the results obtainedfrom our proof-of-concept experiment. Finally, Section 6 presentsconclusions and describes our future work.Published in Proc. of MICRO-27, November 199422 Related WorkHigh-level synthesis [10] and automated instruction set generation[13,15,16] are active areas of research in the CAD community, andalthough the recent work in these areas is relevant to our work, eachgroup is trying to solve slightly different problems. Unlike the workin high-level synthesis which typically attempts to build an applica-tion-specific processor automatically, our work adds programmablelogic to a general-purpose processor, and it relies on the compilerand run-time system to dynamically reconfigure the programmablelogic for each application. Unlike the work in automated instructionset design which systematically analyzes a set of benchmark pro-gram to define an entirely-new instruction set for a given microar-chitecture, our work simply extends an existing instruction set, andit explores microarchitectures that can


View Full Document

CMU CS 15740 - A High-Performance Microarchitecture with Hardware-Programmable Functional Units

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download A High-Performance Microarchitecture with Hardware-Programmable Functional Units
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A High-Performance Microarchitecture with Hardware-Programmable Functional Units and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A High-Performance Microarchitecture with Hardware-Programmable Functional Units 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?