DOC PREVIEW
Code Transformations for TLB Power Reduction

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Code Transformations for TLB Power ReductionAbstract1 Introduction2 Related Work2.1 Hardware Approaches2.2 Software Approaches2.3 Hybrid Approaches3 Architecture Description3.1 Energy Consumption in the Conventional TLB3.2 The Use-Last TLB Architecture4 Experimental Setup5 Organization of Contents6 Part I: Data-TLB Power Reduction6.1 Page-Switch Aware Instruction Scheduling6.2 Page-Switch Aware Array Interleaving6.3 Impact of Loop Unrolling6.4 Comprehensive Page Switch Reduction7 Part II: Instruction-TLB Power Reduction7.1 Instruction TLB Page-Switches7.2 Problem Formulation7.3 Page-Aware Code Placement Heuristic7.4 Illustrative Example7.5 Experiments8 SummaryInt J Parallel ProgDOI 10.1007/s10766-009-0123-8Code Transformations for TLB Power ReductionReiley Jeyapaul · Aviral ShrivastavaReceived: 30 June 2009 / Accepted: 13 December 2009© Springer Science+Business Media, LLC 2010Abstract The Translation Look-aside Buffer (TLB) is a very important part in thehardware support for virtual memory management implementation of high perfor-mance embedded systems. The TLB though small is frequently accessed, and there-fore not only consumes significant energy, but also is one of the important thermalhot-spots in the processor. Recently, several circuit and microarchitectural implemen-tations of TLBs have been proposed to reduce TLB power. One simple, yet effectiveTLB design for power reduction is the Use-Last TLB architecture proposed in IEEEJ Solid State Circuits, 1190–1199, (2004). The Use-Last TLB architecture reduces thepower consumption when the last page is accessed again. In this work, we developcode transformation techniques to reduce the page switchings in data cache accessesand propose an efficient page-aware code placement technique to enhance the energyreduction capabilities achieved by the Use-Last TLB architecture for instruction cacheaccesses. Our comprehensive page switch reduction algorithm results in an averageof 39% reduction in the data-TLB page switching, and our code placement heuristicresults in an average of 76% reduction in the instrucion-TLB page switchings withnegligible impact on the performance on benchmarks from MiBench, Multimedia,DSPStone and BDTI suites. The reduced page switch count through our techniquesachieves an equivalent power savings, above and beyond the reduction achieved bythe Use-Last TLB architecture implementation.Keywords Tlb power · Code transformation · Compiler technique · I-TLB power ·D-TLB power · Instruction scheduling · Code placementR. Jeyapaul (B) · A. ShrivastavaCompiler and Microarchitecture Laboratory, Arizona State University, Tempe, AZ 85281, USAe-mail: [email protected]; [email protected]. Shrivastavae-mail: [email protected] J Parallel Prog1 IntroductionPower, energy and thermal issues in current and near future digital systems formthe crux of the biggest challenge that the semiconductor industry faces today. Inhigh-end computing, power consumption limits the amount of achievable perfor-mance because of exorbitant increase in the cost of heat removal mechanisms. Inbattery operated portable s ystems, the battery is the single largest factor in devicecost, weight, recharging time, frequency and ultimately the usability of the system.Translation Look-aside Buffer or TLB is an important component of high-end multi-tasking embedded processors, like the Intel XScale. The TLB performs virtual tophysical address translation and determines page access permissions. Most modernprocessors, including the Intel XScale implement virtually-addressed caches, in whichthe cache lookup is directly performed on the virtual address provided by the proces-sor, and therefore the TLB lookup comes in the critical path. Elkman et al. [1] notethat the TLBs can consume 20–25% of the total L1 cache energy. Kadayif et al. [2]observed high power densities of the data-TLB, as compared to the data-L1 cache.Thus reducing the power consumption of TLBs is an important research problem.Kadayif et al. [2] in their work show that the i TLB architecture has a power densityof 7.820 nW/mm2compared to 0.975 and 0.670 nW/mm2for iL1 and dL1, respec-tively.Several TLB designs have been proposed to trade-off the TLB lookup delay, areaand power consumption [3,4]. One simple, yet effective technique for TLB powerreduction proposed in [5,6], is the Use-Last TLB architecture. Observing that thereis a high probability that instruction access will refer to the same page as the lastone, they store the previous page translation information into a latch, and therebyreduce the TLB lookup power. The Use-Last TLB architecture is able to reduce theinstruction TLB power by 75%. However, since data accesses do not exhibit as highlocality as instructions, this microarchitectural technique was not effective for dataTLBs.For a modified processor with the inclusion of the Use-Last TLB architecture forboth the instruction and data TLB structures, we present here, compiler directed codetransforamtion techniques to reduce the processor power consumption, by improv-ing the page locality of data and instruction cache accesses. We first propose a novelinstruction scheduling and operand reordering technique heuristic for deciding whento perform array interleaving, and loop unrolling to minimize the page switchingsbetween consecutive data-TLB accesses, while minimizing performance loss. Ourcomprehensive algorithm can reduce the data-TLB page switches by 39%, with mini-mal performance impact experimented over benchmarks from MiBench, Multimedia,DSPStone and BDTI suites. We then propose a novel page-aware code placementheuristic to enhance the page locality of instruction cache accesses and thereby reducethe power consumption of the instruction-TLB by an average of 76% with less than1% variation in perforamnce over benchmark applications from the MiBench suite.It should be noted here that this power reduction obtained through the code trans-formations is above and beyond what the Use-Last hardware technique alone couldachieve.123Int J Parallel Prog2 Related WorkTLB power reduction is important not only to reduce the total energy consumed by theprocessor, but also to alleviate the high power density (hotspot) of TLB in the proces-sor. Several researchers have proposed efficient circuit-level, microarchitectural andsoftware techniques to reduce the power consumption of the TLB and the MemoryManagement Unit.2.1 Hardware ApproachesSeveral researchers have proposed


Code Transformations for TLB Power Reduction

Download Code Transformations for TLB Power Reduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Code Transformations for TLB Power Reduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Code Transformations for TLB Power Reduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?