1. Introduction2. Policies3. Policy evaluation4. Circuit issues4.1 Gated-VDD4.2 ABB-MTCMOS4.3 Dynamic VDD Scaling (DVS)5. Energy consumption6. Conclusions and future work7. AcknowledgementsAbstractOn-chip caches represent a sizable fraction of the totalpower consumption of microprocessors. Although largecaches can significantly improve performance, they havethe potential to increase power consumption. As featuresizes shrink, the dominant component of this power loss willbe leakage. However, during a fixed period of time the activ-ity in a cache is only centered on a small subset of the lines.This behavior can be exploited to cut the leakage power oflarge caches by putting the cold cache lines into a state pre-serving, low-power drowsy mode. Moving lines into and outof drowsy state incurs a slight performance loss. In thispaper we investigate policies and circuit techniques forimplementing drowsy caches. We show that with simplearchitectural techniques, about 80%-90% of the cache linescan be maintained in a drowsy state without affecting per-formance by more than 1%. According to our projections, ina 0.07um CMOS process, drowsy caches will be able toreduce the total energy (static and dynamic) consumed inthe caches by 50%-75%. We also argue that the use ofdrowsy caches can simplify the design and control of low-leakage caches, and avoid the need to completely turn offselected cache lines and lose their state.1. IntroductionHistorically one of the advantages of CMOS over com-peting technologies (e.g. ECL) has been its lower powerconsumption. When not switching, CMOS transistors have,in the past, consumed negligible amounts of power. How-ever, as the speed of these devices has increased along withdensity, so has their leakage (static) power consumption. Wenow estimate that it currently accounts for about 15%-20%of the total power on chips implemented in high-speed pro-cesses. Moreover, as processor technology moves below 0.1micron, static power consumption is set to increase expo-nentially, setting static power consumption on the path todominating the total power used by the CPU (see Figure 1). Various circuit techniques have been proposed to dealwith the leakage problem. These techniques either com-pletely turn off circuits by creating a high-impedance pathto ground (gating) or trade off increased execution time forreduced static power consumption. In some cases, thesetechniques can be implemented entirely at the circuit levelwithout any changes to the architecture or may involve onlysimple architectural modifications. The on-chip caches areone of the main candidates for leakage reduction since theycontain a significant fraction of the processor’s transistors.Approaches for reducing static power consumption ofcaches by turning off cache lines using the gated-VDD tech-nique [1] have been described in [2][3]. These approachesreduce leakage power by selectively turning off cache linesthat contain data that is not likely to be reused. The draw-back of this approach is that the state of the cache line is lostwhen it is turned off and reloading it from the level 2 cachehas the potential to negate any energy savings and have asignificant impact on performance. To avoid these pitfalls, itis necessary to use complex adaptive algorithms and be con-servative about which lines are turned off. Turning off cache lines is not the only way that leakageenergy can be reduced. Significant leakage reduction canalso be achieved by putting a cache line into a low-powerdrowsy mode. When in drowsy mode, the information in thecache line is preserved; however, the line must be reinstatedto a high-power mode before its contents can be accessed.One circuit technique for implementing drowsy caches isDrowsy Caches: Simple Techniques for Reducing Leakage PowerKrisztián Flautner, Nam Sung Kim, Steve Martin, David Blaauw, Trevor [email protected] Ltd110 Fulbourn RoadCambridge, UK CB1 9NJ{kimns, stevenmm, blaauw, tnm}@eecs.umich.eduAdvanced Computer Architecture LabThe University of Michigan1301 Beal Ave. Ann Arbor, MI 48109-2122FIGURE 1. Normalized leakage power through an inverter0200400600800100012000.050.10.150.2Minimum gate length (µm)Normalized leakage power105 ºC75 ºC50 ºC25 ºCThe circuit simulation parameters including threshold voltage were obtainedfrom the Berkeley Predictive Spice Models [4]. The leakage power numberswere obtained by HSPICE simulations.adaptive body-biasing with multi-threshold CMOS (ABB-MTCMOS) [5], where the threshold voltage of a cache lineis increased dynamically to yield reduction in leakageenergy. We propose a simpler and more effective circuittechnique for implementing drowsy caches, where one canchoose between two different supply voltages in each cacheline. Such a dynamic voltage scaling or selection (DVS)technique has been used in the past to trade off dynamicpower consumption and performance [6][7][8]. In this case,however, we exploit voltage scaling to reduce static powerconsumption. Due to short-channel effects in deep-submi-cron processes, leakage current reduces significantly withvoltage scaling [9]. The combined effect of reduced leakagecurrent and voltage yields a dramatic reduction in leakagepower.On a per-bit basis, drowsy caches do not reduce leakageenergy as much as those that rely on gated-VDD. However,we show that for the total power consumption of the cache,drowsy caches can get close to the theoretical minimum.This is because the fraction of total energy consumed by thedrowsy cache in low power mode (after applying our algo-rithms) tends to be only about 25%. Reducing this fractionfurther may be possible but the pay-off is not great(Amdahl’s Law). Moreover, since the penalty for waking upa drowsy line is relatively small (it requires little energy andonly 1 or 2 cycles, depending on circuit parameters), cachelines can be put into drowsy mode more aggressively, thussaving more power.Figure 2 shows the changes necessary for implement-ing a cache line that supports a drowsy mode. There arevery few additions required to a standard cache line. Themain additions are a drowsy bit, a mechanism for control-ling the voltage to the memory cells, and a word line gatingcircuit. In order to support the drowsy mode, the cache linecircuit includes two more transistors than the traditionalmemory circuit. The operating voltage of an array of mem-ory cells in the cache line is determined by the voltage con-troller, which switches the array voltage
View Full Document