Gated Vdd A Circuit Technique to Reduce Leakage in Deep Submicron Cache Memories Michael Powell Se Hyun Yang Babak Falsafi Kaushik Roy and T N Vijaykumar School of Electrical and Computer Engineering Purdue University 1285 EE Building West Lafayette IN 47907 icalp ecn purdue edu http www ece purdue edu icalp Abstract StrongARM are devoted to cache and memory structures 8 Unlike dynamic energy which depends on the number of actively switching transistors leakage energy is a function of the number of on chip transistors independent of their switching activity As such caches account for a large if not dominant component of leakage energy dissipation in recent designs and will continue to do so in the future Unfortunately current proposals for energyefficient cache architectures 7 2 5 1 only target reducing dynamic energy and do not impact leakage energy Deep submicron CMOS designs have resulted in large leakage energy dissipation in microprocessors While SRAM cells in onchip cache memories always contribute to this leakage there is a large variability in active cell usage both within and across applications This paper explores an integrated architectural and circuitlevel approach to reducing leakage energy dissipation in instruction caches We propose gated Vdd a circuit level technique to gate the supply voltage and reduce leakage in unused SRAM cells Our results indicate that gated Vdd together with a novel resizable cache architecture reduces energy delay by 62 with minimal impact on performance There are a myriad of circuit techniques to reduce leakage energy dissipation in transistors circuits e g multi threshold or multisupply voltage design dynamic threshold or dynamic supply voltage design transistor stacking and cooling These techniques however either impact circuit performance and are only applicable to circuit sections that are not performance critical or may require sophisticated fabrication process and increase cost 1 INTRODUCTION The ever increasing levels of on chip integration in the recent decade have enabled phenomenal increases in computer system performance Unfortunately the performance improvement has been also accompanied by an increase in a chip s power and energy dissipation Higher power and energy dissipation require more expensive packaging and cooling technology increase cost decrease product reliability in all segments of computing market and significantly reduce battery life in portable systems Modern cache hierarchies are designed to satisfy the demands of the most memory intensive application phases The actual cache utilization however varies widely both within and across applications We have recently proposed the Dynamically ResIzable instruction cache DRI i cache 11 a novel cache architecture that exploits this variability in utilization Our cache design presents the first fully integrated architectural and circuit level approach to reducing energy dissipation in deepsubmicron cache memories A DRI i cache identifies an application s i cache requirements dynamically and uses a circuit level mechanism gated Vdd to gate the supply voltage to the SRAM cells of the cache s unused sections and reduce leakage Historically chip designers have relied on scaling down the transistor supply voltage in subsequent generations to reduce the dynamic energy dissipation due to a much larger number of onchip transistors Maintaining high transistor switching speeds however requires a commensurate down scaling of the transistor threshold voltage giving rise to a significant amount of leakage energy dissipation even when the transistor is not switching Borkar 3 estimates a factor of 7 5 increase in leakage current and a five fold increase in total leakage energy dissipation in every chip generation While voltage gating effectively eliminates the leakage in SRAM cells it may adversely impact cell performance and prohibitively increase cell area This paper evaluates in detail the design space for gated Vdd with respect to performance energy and area tradeoffs Our results indicate that i a PMOS gated Vdd transistor incurs negligible impact on cell performance and area but only reduces leakage by an order of magnitude ii an NMOS dual Vt gated Vdd transistor virtually eliminates leakage with minimal impact on the cell area but increases cell read time by 35 iii a wide NMOS dual Vt gated Vdd transistor with a charge pump offers the best configuration and virtually eliminates leakage with minimal impact on cell speed and area and iv using gated Vdd a DRI i cache reduces the overall energy delay in applications by 62 State of the art microprocessor designs devote a large fraction of the chip area to memory structures e g multiple levels of instruction i cache caches and data d cache caches TLBs and prediction tables For instance 30 of Alpha 21264 and 60 of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise or republish to post on servers or to redistribute to lists requires prior specific permission and or a fee ISLPED 00 Rapallo Italy Copyright 2000 ACM 1 58113 190 9 00 0007 5 00 90 offset index resizing range size mask DRI I CACHE minimum size bound size 0 11 111 masked index upsize miss count miss bound mask shift left miss bound v tag downsize miss count miss bound mask shift right compare miss count yes miss counter miss data block resizing range tag downsizeupsize address hit miss end of interval FIGURE 1 A DRI i cache s anatomy The rest of the paper is organized as follows In Section 2 we present an overview of a DRI i cache In Section 3 we describe the circuit level gated Vdd mechanism to reduce leakage in SRAM cells In Section 4 we present experimental results Finally we conclude the paper in Section 5 val Thus the miss bound provides a fine grain resizing control between any two intervals independent of the cache size Applications typically require a specific minimum cache capacity beyond which they incur a large number of capacity misses and thrash Size bound provides a coarse grain resizing control by preventing the cache from thrashing due to a small size 2 DRI I CACHE OVERVIEW The other two parameters the sense interval length and divisibility are less critical to a DRI i cache s performance Intuitively the sense interval
View Full Document
Unlocking...