Princeton ELE 572 - Multiple-Banked Register File Architectures - D346397

Home> Schools> Princeton University> Electrical Engineering (ELE) > ELE 572> Multiple-Banked Register File Architectures

Princeton ELE 572 - Multiple-Banked Register File Architectures

School name Princeton University

Course Ele 572- Processor Architectures for New Paradigms

Pages 10

Download Save

Unformatted text preview:

Abstract1. IntroductionFigure 1: IPC for a varying number of physical registers. The harmonic mean for SpecInt95 and Spe...2. Impact of the Register File ArchitectureFigure 2: IPC for a 1-cycle register file, a 2-cycle register file and a 2-cycle register file wi...Figure 3: Cumulative distribution of number of registers.3. A Multiple-Banked Register File4. Performance Evaluation4.1. Experimental FrameworkTable 1: Processor microarchitectural parameters4.2. Performance resultsFigure 5: IPC for different register file cache architectures.5. Related work6. Conclusions7. Acknowledgments8. References[1] B.K. Bray and M.J. Flynn, “A Two-Level Windowed Register File”, Technical Report CSL-TR-91-49...[2] K.I. Farkas, N.P. Jouppi and P. Chow, “Register File Considerations in Dynamically Scheduled ...[3] R.E. Kessler, “The Alpha 21264 Microprocessor”, IEEE Micro, 19(2):24-36, March 1999.[4] J. Llosa and K. Arazabal, “Area and Access Time Models for Multi-Port Register Files and Queu...[5] J. Llosa, M. Valero and E. Ayguade, “Non- Consistent Dual Register Files to Reduce Register P...[6] J. Llosa, M. Valero, J.A.B. Fortes and E. Ayguade, “Using Sacks to Organize Registers in VLIW...[7] D. Matzke, “Will Physical Scalability Sabotage Performance Gains?”, IEEE Computer, 30(9):37-3...[8] A.S. Palacharla, N.P. Jouppi and J.E. Smith, “Complexity-Effective Superscalar Processors”, i...[9] S. Rixner et al., “Register Organization for Media Processing”, in Proc. of Int. Symp. on Hig...[10] R. M. Russell, “The Cray-1 Computer System”, in Reading in Computer Architecture, Morgan Kau...[11] J.A. Swensen and Y.N. Patt, “Hierarchical Registers for Scientific Computers”, in Proc. of I...[12] D.M. Tullsen et al., “Exploiting Choice: Instruction Fetch and Issue on an Implementable Sim...[13] D.M. Tullsen, S.J. Eggers and H.M. Levy, “Simultaneous Multithreading: Maximizing On- Chip P...[14] D.W. Wall, “Limits of Instruction-Level Parallelism” Technical Report WRL 93/6 Digital Weste...[15] S. Wallace and N. Bagherzadeh, “A Scalable Register File Architecture for Dynamically Schedu...[16] S.J.E. Wilton and N.P. Jouppi, “An Enhanced Cache Access and Cycle Time Model”, IEEE Journal...[17] R. Yung and N.C. Wilhelm, “Caching Processor General Registers”, in Proc. Int. Conf. on Circ...Figure 4: Multiple-banked register file architectures.Figure 6: Register file cache versus a single bank with a single level of bypass.Figure 7: Register file cache versus a single bank with full bypass.Multiple-Banked Register File ArchitecturesJosé-Lorenzo Cruz, Antonio González and Mateo Valero Nigel P. TophamDepartament d’Arquitectura de Computadors Siroyan LtdUniversitat Politècnica de Catalunya Wyvols CourtJordi Girona, 1-3 Mòdul D6 Swallowfield08034 Barcelona, Spain Berkshire RG7 1WY, U.K.{cruz,antonio,mateo}@ac.upc.es [email protected] 8: Performance for a varying area cost.Figure 9: Performance of different register file architectures when the access time is factored i...Table 2: Number of read (R) write (W) ports of each configuration. For the register file cache, n...AbstractThe register file access time is one of the critical delays in currentsuperscalar processors. Its impact on processor performance islikely to increase in future processor generations, as they areexpected to increase the issue width (which implies more registerports) and the size of the instruction window (which implies moreregisters), and to use some kind of multithreading. Under thisscenario, the register file access time could be a dominant delay anda pipelined implementation would be desirable to allow for highclock rates.However, a multi-stage register file has severe implicationsfor processor performance (e.g. higher branch mispredictionpenalty) and complexity (more levels of bypass logic). To tacklethese two problems, in this paper we propose a register filearchitecture composed of multiple banks. In particular we focus ona multi-level organization of the register file, which provides lowlatency and simple bypass logic. We propose several cachingpolicies and prefetching strategies and demonstrate the potential ofthis multiple-banked organization. For instance, we show that atwo-level organization degrades IPC by 10% and 2% with respectto a non-pipelined single-banked register file, for SpecInt95 andSpecFP95 respectively, but it increases performance by 87% and92% when the register file access time is factored in.Keywords: Register file architecture, dynamically-scheduledprocessor, bypass logic, register file cache.1. IntroductionMost current dynamically scheduled microprocessors have a RISC-like instruction set architecture, and therefore, the majority ofinstruction operands reside in the register file. The access time ofthe register file basically depends on both the number of registersand the number of ports [8]. To achieve high performance,microprocessor designers strive to increase the issue width.However, wider issue machines require more ports in the registerfile, which may significantly increase its access time [2]. Moreover,a wide issue machine is only effective if it is accompanied by alarge instruction window [14] or some type of multithreading [13].Large instruction windows and multithreading imply a largenumber of instructions in-flight, which directly determines thenumber of required registers [2]. However, increasing the numberof register also increases the register file access time. On the otherhand, technology evolution produces successive reductions inminimum feature sizes, which results in higher circuit densities butit also exacerbates the impact of wire delays [7]. Since a significantpart of the register file access time is due to wire delays, futureprocessor generations are expected to be even more affected by theaccess time problem.Current trends in microprocessor design and technology leadto projections that the access time of a monolithic register file willbe significantly higher than that of other common operations, suchas integer additions. Under this scenario, a pipelined register file iscritical to high performance; otherwise, the processor cycle timewould be determined by the register file access time. However,pipelining a register file is not trivial. Moreover, a multi-cyclepipelined register file still causes a performance degradation incomparison with a single-cycle register file, since a multi-cycleregister file increases the branch

View Full Document


School:
Email:
New Password:
Confirm Password:

Princeton ELE 572 - Multiple-Banked Register File Architectures

Sign up for free to view:

Please select your school