Gated-Vdd: A Circuit Technique to Reduce Leakage in Deep-Submicron Cache MemoriesMichael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, and T. N. VijaykumarSchool of Electrical and Computer EngineeringPurdue University 1285 EE BuildingWest Lafayette, IN [email protected], http://www.ece.purdue.edu/~icalpPermission to make digital or hard copies of all or part of this work for per-sonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior spe-cific permission and/or a fee.ISLPED’00, Rapallo, Italy.Copyright 2000 ACM 1-58113-190-9/00/0007...$5.00.90Abstract Deep-submicron CMOS designs have resulted in large leakageenergy dissipation in microprocessors. While SRAM cells in on-chip cache memories always contribute to this leakage, there is alarge variability in active cell usage both within and across appli-cations. This paper explores an integrated architectural and circuit-level approach to reducing leakage energy dissipation in instruc-tion caches. We propose, gated-Vdd, a circuit-level technique togate the supply voltage and reduce leakage in unused SRAM cells.Our results indicate that gated-Vdd together with a novel resizablecache architecture reduces energy-delay by 62% with minimalimpact on performance.1 INTRODUCTIONThe ever-increasing levels of on-chip integration in the recentdecade have enabled phenomenal increases in computer systemperformance. Unfortunately, the performance improvement hasbeen also accompanied by an increase in a chip’s power andenergy dissipation. Higher power and energy dissipation requiremore expensive packaging and cooling technology, increase cost,decrease product reliability in all segments of computing market,and significantly reduce battery life in portable systems. Historically, chip designers have relied on scaling down the tran-sistor supply voltage in subsequent generations to reduce thedynamic energy dissipation due to a much larger number of on-chip transistors. Maintaining high transistor switching speeds,however, requires a commensurate down-scaling of the transistorthreshold voltage giving rise to a significant amount of leakageenergy dissipation even when the transistor is not switching.Borkar [3] estimates a factor of 7.5 increase in leakage current anda five-fold increase in total leakage energy dissipation in everychip generation. State-of-the-art microprocessor designs devote a large fraction ofthe chip area to memory structures — e.g., multiple levels ofinstruction (i-cache) caches and data (d-cache) caches, TLBs, andprediction tables. For instance, 30% of Alpha 21264 and 60% ofStrongARM are devoted to cache and memory structures [8].Unlike dynamic energy which depends on the number of activelyswitching transistors, leakage energy is a function of the number ofon-chip transistors, independent of their switching activity. Assuch, caches account for a large (if not dominant) component ofleakage energy dissipation in recent designs, and will continue todo so in the future. Unfortunately, current proposals for energy-efficient cache architectures [7,2,5,1] only target reducing dynamicenergy and do not impact leakage energy.There are a myriad of circuit techniques to reduce leakage energydissipation in transistors/circuits (e.g., multi-threshold or multi-supply voltage design, dynamic threshold or dynamic supply volt-age design, transistor stacking, and cooling). These techniques,however, either impact circuit performance and are only applicableto circuit sections that are not performance-critical, or may requiresophisticated fabrication process and increase cost.Modern cache hierarchies are designed to satisfy the demands ofthe most memory-intensive application phases. The actual cacheutilization, however, varies widely both within and across applica-tions. We have recently proposed the Dynamically ResIzableinstruction-cache (DRI i-cache) [11], a novel cache architecturethat exploits this variability in utilization. Our cache design presents the first fully-integrated architecturaland circuit-level approach to reducing energy dissipation in deep-submicron cache memories. A DRI i-cache identifies an applica-tion’s i-cache requirements dynamically, and uses a circuit-levelmechanism, gated-Vdd, to gate the supply voltage to the SRAMcells of the cache’s unused sections and reduce leakage. While voltage gating effectively eliminates the leakage in SRAMcells, it may adversely impact cell performance and prohibitivelyincrease cell area. This paper evaluates in detail the design spacefor gated-Vdd with respect to performance, energy, and area trade-offs. Our results indicate that: (i) a PMOS gated-Vdd transistorincurs negligible impact on cell performance and area but onlyreduces leakage by an order of magnitude, (ii) an NMOS dual-Vtgated-Vdd transistor virtually eliminates leakage with minimalimpact on the cell area but increases cell read time by 35%, (iii) awide NMOS dual-Vt gated-Vdd transistor with a charge pumpoffers the best configuration and virtually eliminates leakage withminimal impact on cell speed and area, and (iv) using gated-Vdd aDRI i-cache reduces the overall energy-delay in applications by62%.91The rest of the paper is organized as follows. In Section 2, wepresent an overview of a DRI i-cache. In Section 3, we describe thecircuit-level gated-Vdd mechanism to reduce leakage in SRAMcells. In Section 4, we present experimental results. Finally, weconclude the paper in Section 5.2 DRI I-CACHE OVERVIEWThe key observation behind a DRI i-cache design is that there is alarge variability in i-cache utilization both within and across pro-grams leading to large energy inefficiency in conventional caches;while the memory cells in the cache’s unused sections are notactively referenced, they leak current and dissipate energy. Ourapproach to resizing the cache increases or decreases the numberof sets used in the cache. In this section, we present an overview ofa DRI i-cache’s anatomy. For a more detailed description of a DRIi-cache, please refer to [11].2.1 DRI i-cache designMuch like conventional adaptive computing frameworks, ourcache uses a set of parameters to monitor, react, and adapt tochanges in application behavior and system requirements dynami-cally. Figure 1 depicts
View Full Document