Princeton ELE 572 - Multiple-Banked Register File Architectures

Unformatted text preview:

Abstract1. IntroductionFigure 1: IPC for a varying number of physical registers. The harmonic mean for SpecInt95 and Spe...2. Impact of the Register File ArchitectureFigure 2: IPC for a 1-cycle register file, a 2-cycle register file and a 2-cycle register file wi...Figure 3: Cumulative distribution of number of registers.3. A Multiple-Banked Register File4. Performance Evaluation4.1. Experimental FrameworkTable 1: Processor microarchitectural parameters4.2. Performance resultsFigure 5: IPC for different register file cache architectures.5. Related work6. Conclusions7. Acknowledgments8. References[1] B.K. Bray and M.J. Flynn, “A Two-Level Windowed Register File”, Technical Report CSL-TR-91-49...[2] K.I. Farkas, N.P. Jouppi and P. Chow, “Register File Considerations in Dynamically Scheduled ...[3] R.E. Kessler, “The Alpha 21264 Microprocessor”, IEEE Micro, 19(2):24-36, March 1999.[4] J. Llosa and K. Arazabal, “Area and Access Time Models for Multi-Port Register Files and Queu...[5] J. Llosa, M. Valero and E. Ayguade, “Non- Consistent Dual Register Files to Reduce Register P...[6] J. Llosa, M. Valero, J.A.B. Fortes and E. Ayguade, “Using Sacks to Organize Registers in VLIW...[7] D. Matzke, “Will Physical Scalability Sabotage Performance Gains?”, IEEE Computer, 30(9):37-3...[8] A.S. Palacharla, N.P. Jouppi and J.E. Smith, “Complexity-Effective Superscalar Processors”, i...[9] S. Rixner et al., “Register Organization for Media Processing”, in Proc. of Int. Symp. on Hig...[10] R. M. Russell, “The Cray-1 Computer System”, in Reading in Computer Architecture, Morgan Kau...[11] J.A. Swensen and Y.N. Patt, “Hierarchical Registers for Scientific Computers”, in Proc. of I...[12] D.M. Tullsen et al., “Exploiting Choice: Instruction Fetch and Issue on an Implementable Sim...[13] D.M. Tullsen, S.J. Eggers and H.M. Levy, “Simultaneous Multithreading: Maximizing On- Chip P...[14] D.W. Wall, “Limits of Instruction-Level Parallelism” Technical Report WRL 93/6 Digital Weste...[15] S. Wallace and N. Bagherzadeh, “A Scalable Register File Architecture for Dynamically Schedu...[16] S.J.E. Wilton and N.P. Jouppi, “An Enhanced Cache Access and Cycle Time Model”, IEEE Journal...[17] R. Yung and N.C. Wilhelm, “Caching Processor General Registers”, in Proc. Int. Conf. on Circ...Figure 4: Multiple-banked register file architectures.Figure 6: Register file cache versus a single bank with a single level of bypass.Figure 7: Register file cache versus a single bank with full bypass.Multiple-Banked Register File ArchitecturesJosé-Lorenzo Cruz, Antonio González and Mateo Valero Nigel P. TophamDepartament d’Arquitectura de Computadors Siroyan LtdUniversitat Politècnica de Catalunya Wyvols CourtJordi Girona, 1-3 Mòdul D6 Swallowfield08034 Barcelona, Spain Berkshire RG7 1WY, U.K.{cruz,antonio,mateo}@ac.upc.es [email protected] 8: Performance for a varying area cost.Figure 9: Performance of different register file architectures when the access time is factored i...Table 2: Number of read (R) write (W) ports of each configuration. For the register file cache, n...AbstractThe register file access time is one of the critical delays in currentsuperscalar processors. Its impact on processor performance islikely to increase in future processor generations, as they areexpected to increase the issue width (which implies more registerports) and the size of the instruction window (which implies moreregisters), and to use some kind of multithreading. Under thisscenario, the register file access time could be a dominant delay anda pipelined implementation would be desirable to allow for highclock rates.However, a multi-stage register file has severe implicationsfor processor performance (e.g. higher branch mispredictionpenalty) and complexity (more levels of bypass logic). To tacklethese two problems, in this paper we propose a register filearchitecture composed of multiple banks. In particular we focus ona multi-level organization of the register file, which provides lowlatency and simple bypass logic. We propose several cachingpolicies and prefetching strategies and demonstrate the potential ofthis multiple-banked organization. For instance, we show that atwo-level organization degrades IPC by 10% and 2% with respectto a non-pipelined single-banked register file, for SpecInt95 andSpecFP95 respectively, but it increases performance by 87% and92% when the register file access time is factored in.Keywords: Register file architecture, dynamically-scheduledprocessor, bypass logic, register file cache.1. IntroductionMost current dynamically scheduled microprocessors have a RISC-like instruction set architecture, and therefore, the majority ofinstruction operands reside in the register file. The access time ofthe register file basically depends on both the number of registersand the number of ports [8]. To achieve high performance,microprocessor designers strive to increase the issue width.However, wider issue machines require more ports in the registerfile, which may significantly increase its access time [2]. Moreover,a wide issue machine is only effective if it is accompanied by alarge instruction window [14] or some type of multithreading [13].Large instruction windows and multithreading imply a largenumber of instructions in-flight, which directly determines thenumber of required registers [2]. However, increasing the numberof register also increases the register file access time. On the otherhand, technology evolution produces successive reductions inminimum feature sizes, which results in higher circuit densities butit also exacerbates the impact of wire delays [7]. Since a significantpart of the register file access time is due to wire delays, futureprocessor generations are expected to be even more affected by theaccess time problem.Current trends in microprocessor design and technology leadto projections that the access time of a monolithic register file willbe significantly higher than that of other common operations, suchas integer additions. Under this scenario, a pipelined register file iscritical to high performance; otherwise, the processor cycle timewould be determined by the register file access time. However,pipelining a register file is not trivial. Moreover, a multi-cyclepipelined register file still causes a performance degradation incomparison with a single-cycle register file, since a multi-cycleregister file increases the branch


View Full Document
Download Multiple-Banked Register File Architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple-Banked Register File Architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple-Banked Register File Architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?