DOC PREVIEW
Deep-Submicron Instruction Cache

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Abstract1 Introduction2 DRI I-cache: Reducing Deep-submicron I-cache Leakage2.1 Basic DRI I-Cache DesignFIGURE 1: Anatomy of a DRI i-cache.2.2 Implications on Cache Lookups2.3 Impact on Energy and Performance2.3.1 Controlling Extra MissesFIGURE 2: 6-T SRAM cells connected to a gated-Vdd transistor (typical transistor W/L ratios).3 Gated-Vdd: Circuit-level Supply-Voltage Gating4 MethodologyTable 1: System configuration parameters.FIGURE 3: Layout of 64 SRAM cells connected to a single gated-Vdd NMOS transistor.5 Results5.1 Circuit Results5.1.1 Impact of Lowering Threshold VoltageTable 2: Lowering transistor threshold voltages.5.1.2 Impact of Various Gated-Vdd ImplementationsTable 3: Widening the gated-Vdd transistor.Table 4: Energy, speed, and area of various gated-Vdd implementations.5.2 Energy Calculations5.2.1 Leakage and Dynamic Energy Trade-off5.3 Overall Energy Savings and Performance ResultsFIGURE 4: Base energy-delay and average cache size measurements.FIGURE 5: Impact of varying the miss-bound.5.3.1 Impact of Varying Miss-Bound5.3.2 Impact of Varying Size-BoundFIGURE 6: Impact of varying the size-bound.5.3.3 Impact of Varying Sense-Interval Length and Divisibility6 ConclusionsAcknowledgementsReferences1An Energy-Efficient High-PerformanceDeep-Submicron Instruction CacheAppears in IEEE TVLSI special issue on low-power design, February 2001.Michael D. Powellϒ, Se-Hyun Yangβ1, Babak Falsafiβ1,KaushikRoyϒ, and T. N. VijaykumarϒβElectrical and Computer Engineering DepartmentCarnegie Mellon University{syang,babak}@ece.cmu.eduϒSchool of Electrical and Computer EngineeringPurdue University{mdpowell,kaushik,vijay}@ecn.purdue.eduhttp://www.ece.purdue.edu/~icalp1This work was performed when Se-Hyun Yang and Babak Falsafi were at theSchool of Electrical and Computer Engineering at Purdue University.AbstractDeep-submicron CMOS designs maintain high transistor switchingspeeds by scaling down the supply voltage and proportionatelyreducing the transistor threshold voltage. Lowering the thresholdvoltage increases leakage energy dissipation due to subthresholdleakage current even when the transistor is not switching. Estimatessuggest a five-fold increase in leakage energy in every future genera-tion. In modern microarchitectures, much of the leakage energy isdissipated in large on-chip cache memory structures with high tran-sistor densities. While cache utilization varies both within and acrossapplications, modern cache designs are fixed in size resulting intransistor leakage inefficiencies.This paper explores an integrated architectural and circuit-levelapproach to reducing leakage energy in instruction caches (i-caches).Atthearchitecturelevel,weproposetheDynamically ResIzable i-cache (DRI i-cache), a novel i-cache design that dynamically resizesand adapts to an application’s required size. At the circuit-level, weuse gated-Vdd, a novel mechanism that effectively turns off the sup-ply voltage to, and eliminates leakage in, the SRAM cells in a DRI i-cache’s unused sections. Architectural and circuit-level simulationresults indicate that a DRI i-cache successfully and robustly exploitsthe cache size variability both within and across applications. Com-pared to a conventional i-cache using an aggressively-scaled thresh-old voltage a 64K DRI i-cache reduces on average both the leakageenergy-delay product and cache size by 62%, with less than 4%impact on execution time. Our results also indicate that a wideNMOS dual-Vtgated-Vddtransistor with a charge pump offers thebest gating implementation and virtually eliminates leakage energywith minimal increase in an SRAM cell read time area as comparedto an i-cache with an aggressively-scaled threshold voltage.Keywords: Cache memories, adapative systems, computer architec-ture, energy management, leakage currents.1INTRODUCTIONThe ever-increasing levels of on-chip integration in the recentdecade have enabled phenomenal increases in computer system per-formance. Unfortunately, the performance improvement has beenaccompanied by an increase in chips’ energy dissipation. Higherenergy dissipation requires more expensive packaging and coolingtechnology, increases cost, and decreases reliability of products in allsegments of computing market from portable systems to high-endservers [21]. Moreover, higher energy dissipation significantlyreduces battery life and diminishes the utility of portable systems.Historically, the primary source of energy dissipation in CMOS tran-sistor devices has been the dynamic energy due to charging/discharg-ing load capacitances when a device switches. Chip designers haverelied on scaling down the transistor supply voltage in subsequentgenerations to reduce this dynamic energy dissipation due to a muchlarger number of on-chip transistors.Maintaining high transistor switching speeds, however, requires acommensurate down-scaling of the transistor threshold voltagealong with the supply voltage [19]. The International TechnologyRoadmap for Semiconductors [20] predicts a steady scaling of sup-ply voltage with a corresponding decrease in transistor thresholdvoltage to maintain a 30% improvement in performance every gen-eration. Transistor threshold scaling, in turn, gives rise to a signifi-cantamountofleakage energy dissipation due to an exponentialincrease in subthreshold leakage current even when the transistor isnot switching [3,28,24,16,22,11,6]. Borkar [3] estimates a factor of7.5 increase in leakage current and a five-fold increase in total leak-age energy dissipation in every chip generation.State-of-the-art microprocessor designs devote a large fraction of thechip area to memory structures — e.g., multiple levels of instructioncaches and data caches, translation lookaside buffers, and predictiontables. For instance, 30% of Alpha 21264 and 60% of StrongARMare devoted to cache and memory structures [14]. Unlike dynamicenergy which depends on the number of actively switching transis-tors, leakage energy is a function of the number of on-chip transis-tors, independent of their switching activity. As such, caches accountfor a large (if not dominant) component of leakage energy dissipa-tion in recent designs, and will continue to do so in the future.Recent energy estimates for 0.13µ processes indicate that leakageenergy accounts for 30% of L1 cache energy and as much as 80% ofL2 cache energy [7]. Unfortunately, current proposals for energy-efficient cache architectures [13,2,1] only target reducing dynamicenergy and


Deep-Submicron Instruction Cache

Download Deep-Submicron Instruction Cache
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Deep-Submicron Instruction Cache and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Deep-Submicron Instruction Cache 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?