Unformatted text preview:

Published in Proc of MICRO 27 November 1994 A High Performance Microarchitecture with Hardware Programmable Functional Units Rahul Razdan and Michael D Smith Harvard University Cambridge MA 02138 Digital Equipment Corporation Hudson MA 01742 Abstract applicable to a CISC architecture PRISC augments the conventional set of RISC instructions with application specific instructions that are implemented in hardware programmable functional units PFUs These PFUs are carefully added to the microarchitecture so we maintain the benefits of high performance RISC techniques e g fixed instruction formats and we minimally impact the processor s cycle time This paper explores a novel way to incorporate hardware programmable resources into a processor microarchitecture to improve the performance of general purpose applications Through a coupling of compile time analysis routines and hardware synthesis tools we automatically configure a given set of the hardware programmable functional units PFUs and thus augment the base instruction set architecture so that it better meets the instruction set needs of each application We refer to this new class of general purpose computers as PRogrammable Instruction Set Computers PRISC Although similar in concept the PRISC approach differs from dynamically programmable microcode because in PRISC we define entirely new primitive datapath operations In this paper we concentrate on the microarchitectural design of the simplest form of PRISC a RISC microprocessor with a single PFU that only evaluates combinational functions We briefly discuss the operating system and the programming language compilation techniques that are needed to successfully build PRISC and we present performance results from a proof of concept study With the inclusion of a single 32 bit wide PFU whose hardware cost is less than that of a 1 kilobyte SRAM our study shows a 22 improvement in processor performance on the SPECint92 benchmarks To generate these application specific PFU instructions in an automated fashion we have developed compilation routines that analyze the hardware complexity of individual instructions Using this information the compiler interacts with sophisticated logic synthesis programs to select sequences of instructions that will execute faster if implemented in PFU hardware Since the PFU instruction generation process is driven by the specific computations found in each application our PRISC approach avoids the semantics gap problems of CISC architectures 14 Furthermore the complexity of our approach is completely hidden from the user programmer The most general computational model for a PFU is a multi cycle sequential state machine Iterative hardware solutions for squareroot or transcendental function evaluation are good examples of this class of PFU The general model however introduces synchronization complexities between the PFU and the other RISC functional units For this paper we discuss a simpler model that implements a combinational function of two inputs and one output The synthesis routines constrain the complexity of this combinational function so that its delay is equal to the delay of the ALU already in the processor datapath With these two restrictions a PFU can use the same synchronization mechanisms as the other RISC functional units We refer to this first implementation of the PRISC architecture as PRISC 1 Keywords programmable logic general purpose microarchitectures automatic instruction set design compile time optimization logic synthesis 1 Introduction A number of studies have shown that the use of hardware programmable logic such as FPGAs can improve application performance by tailoring hardware paths to match the particular characteristics of the individual application 4 5 6 17 Overall the architectures in these studies only work well for special purpose domains such as logic simulation and large number multiplication To effectively use hardware programmable resources in generalpurpose environment we must develop a new approach that is cost effective automatic and applicable to the vast majority of applications PRISC 1 was originally meant as a proof of concept vehicle that would allow us to develop the basic PRISC compilation and synthesis environment To our surprise the PRISC 1 microarchitecture exhibited noticeable performance benefits not only on the computer aided design CAD applications in the SPECint92 benchmark suite but on the other applications as well Even though a PFU is significantly slower than a highly customized RISC functional unit we can automatically find opportunities to use a PFU where a typical custom functional unit is not adequate Our PRISC environment makes the MIPS 1 rule work on a per application basis 18 Our architectural approach to achieve these goals is called PRogrammable Instruction Set Computers PRISC To be cost effective we implement PRISC on top of an existing high performance processor microarchitecture For this paper we use a RISC architecture as our base though our PRISC techniques are equally The next section summarizes some work related to the use of programmable logic in processor design and the automatic generation of instruction sets Section 3 describes the microarchitecture of PRISC 1 while Section 4 overviews our PRISC compilation environment and hardware extraction techniques Section 5 discusses our performance modeling environment and the results obtained from our proof of concept experiment Finally Section 6 presents conclusions and describes our future work 1 Published in Proc of MICRO 27 November 1994 2 Related Work when run on their PRISM 1 prototype Overall there are a number of shortcomings in their initial work that our work attempts to overcome In particular their prototype compiler requires some user interaction while our prototype compiler is fully automated they report performance results only for hardware optimized routines while we report results for entire applications and they add programmable logic to a relatively slow microprocessor 10 MHz M68010 while we experiment with fast cycle times 200 MHz High level synthesis 10 and automated instruction set generation 13 15 16 are active areas of research in the CAD community and although the recent work in these areas is relevant to our work each group is trying to solve slightly different problems Unlike the work in high level synthesis which typically attempts to build an application specific processor automatically our work adds


View Full Document

CMU CS 15740 - A High-Performance Microarchitecture with Hardware-Programmable Functional Units

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view A High-Performance Microarchitecture with Hardware-Programmable Functional Units and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A High-Performance Microarchitecture with Hardware-Programmable Functional Units and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?