Unformatted text preview:

Drowsy Caches Simple Techniques for Reducing Leakage Power Kriszti n Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge krisztian flautner arm com ARM Ltd 110 Fulbourn Road Cambridge UK CB1 9NJ kimns stevenmm blaauw tnm eecs umich edu Advanced Computer Architecture Lab The University of Michigan 1301 Beal Ave Ann Arbor MI 48109 2122 Abstract On chip caches represent a sizable fraction of the total power consumption of microprocessors Although large caches can significantly improve performance they have the potential to increase power consumption As feature sizes shrink the dominant component of this power loss will be leakage However during a fixed period of time the activity in a cache is only centered on a small subset of the lines This behavior can be exploited to cut the leakage power of large caches by putting the cold cache lines into a state preserving low power drowsy mode Moving lines into and out of drowsy state incurs a slight performance loss In this paper we investigate policies and circuit techniques for implementing drowsy caches We show that with simple architectural techniques about 80 90 of the cache lines can be maintained in a drowsy state without affecting performance by more than 1 According to our projections in a 0 07um CMOS process drowsy caches will be able to reduce the total energy static and dynamic consumed in the caches by 50 75 We also argue that the use of drowsy caches can simplify the design and control of lowleakage caches and avoid the need to completely turn off selected cache lines and lose their state without any changes to the architecture or may involve only simple architectural modifications The on chip caches are one of the main candidates for leakage reduction since they contain a significant fraction of the processor s transistors Approaches for reducing static power consumption of caches by turning off cache lines using the gated VDD technique 1 have been described in 2 3 These approaches reduce leakage power by selectively turning off cache lines that contain data that is not likely to be reused The drawback of this approach is that the state of the cache line is lost when it is turned off and reloading it from the level 2 cache has the potential to negate any energy savings and have a significant impact on performance To avoid these pitfalls it is necessary to use complex adaptive algorithms and be conservative about which lines are turned off Turning off cache lines is not the only way that leakage energy can be reduced Significant leakage reduction can also be achieved by putting a cache line into a low power drowsy mode When in drowsy mode the information in the cache line is preserved however the line must be reinstated to a high power mode before its contents can be accessed One circuit technique for implementing drowsy caches is FIGURE 1 Normalized leakage power through an inverter 1 Introduction 1200 Normalized leakage power Historically one of the advantages of CMOS over competing technologies e g ECL has been its lower power consumption When not switching CMOS transistors have in the past consumed negligible amounts of power However as the speed of these devices has increased along with density so has their leakage static power consumption We now estimate that it currently accounts for about 15 20 of the total power on chips implemented in high speed processes Moreover as processor technology moves below 0 1 micron static power consumption is set to increase exponentially setting static power consumption on the path to dominating the total power used by the CPU see Figure 1 Various circuit techniques have been proposed to deal with the leakage problem These techniques either completely turn off circuits by creating a high impedance path to ground gating or trade off increased execution time for reduced static power consumption In some cases these techniques can be implemented entirely at the circuit level 105 C 1000 75 C 800 50 C 25 C 600 400 200 0 0 2 0 15 0 1 0 05 Minimum gate length m The circuit simulation parameters including threshold voltage were obtained from the Berkeley Predictive Spice Models 4 The leakage power numbers were obtained by HSPICE simulations FIGURE 2 Implementation of the drowsy cache line drowsy bit voltage controller drowsy set word line driver row decoder drowsy power line VDD 1V SRAMs VDDLow 0 3V drowsy word line wake up reset word line wordline gate drowsy signal Note that for simplicity the word line bit lines and two pass transistors in the drowsy bit are not shown in this picture adaptive body biasing with multi threshold CMOS ABBMTCMOS 5 where the threshold voltage of a cache line is increased dynamically to yield reduction in leakage energy We propose a simpler and more effective circuit technique for implementing drowsy caches where one can choose between two different supply voltages in each cache line Such a dynamic voltage scaling or selection DVS technique has been used in the past to trade off dynamic power consumption and performance 6 7 8 In this case however we exploit voltage scaling to reduce static power consumption Due to short channel effects in deep submicron processes leakage current reduces significantly with voltage scaling 9 The combined effect of reduced leakage current and voltage yields a dramatic reduction in leakage power On a per bit basis drowsy caches do not reduce leakage energy as much as those that rely on gated VDD However we show that for the total power consumption of the cache drowsy caches can get close to the theoretical minimum This is because the fraction of total energy consumed by the drowsy cache in low power mode after applying our algorithms tends to be only about 25 Reducing this fraction further may be possible but the pay off is not great Amdahl s Law Moreover since the penalty for waking up a drowsy line is relatively small it requires little energy and only 1 or 2 cycles depending on circuit parameters cache lines can be put into drowsy mode more aggressively thus saving more power Figure 2 shows the changes necessary for implementing a cache line that supports a drowsy mode There are very few additions required to a standard cache line The main additions are a drowsy bit a mechanism for controlling the voltage to the memory cells and a word line gating circuit In order to support the drowsy mode the cache line circuit includes two more transistors than the traditional memory circuit The operating voltage of an array of memory


View Full Document

CMU CS 15740 - Drowsy Caches

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view Drowsy Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Drowsy Caches and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?