New version page

Fragment Cache Management

Upgrade to remove ads

This preview shows page 1-2-3 out of 10 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

 ACM 2007. This is the author’s version of the work. It is posted here bypermission of ACM for your personal use. Not for redistribution. The defin-itive version will be published in the Proceedings of the International Con-ference on Compilers, Architecture and Synthesis for Embedded Systems,October 1-3, 2007.ABSTRACTDynamic binary translation (DBT) has been used to achievenumerous goals (e.g., better performance) for general-purposecomputers. Recently, DBT has also attracted attention for embed-ded systems. However, a challenge to DBT in this domain is strin-gent constraints on memory and performance. The translated codebuffer used by DBT may occupy too much memory space. Thispaper proposes novel schemes to manage this buffer with scratch-pad memory. We use footprint reduction to minimize the spaceneeded by the translated code, victim compression to reduce thecost of retranslating previously seen code, and fragment pinning toavoid evicting needed code. We comprehensively evaluate ourtechniques to demonstrate their effectiveness.Categories and Subject DescriptorsC.3 [Computer Systems Organization]: Special-Purpose andApplication-Based Systems - Real-time and Embedded Systems.D.3.4 [Programming Languages]: Processors - Code Generation,Compilers, Incremental Compilers, Interpreters, Optimization,Run-time Environments. General TermsAlgorithms, Measurement, Performance, Design, Languages. KeywordsDynamic Binary Translation, Embedded Systems, Scratchpad. 1. INTRODUCTIONDynamic binary translation (DBT) has gained much attention as apowerful technique for constructing adaptive software [2, 3, 6, 21,24]. DBT has led to new software capabilities, such as resourcevirtualization, intrusion detection, performance improvement, andinstruction set migration. Although DBT has been widely appliedto general-purpose systems, recent work has shown several uses ofDBT for embedded systems, including power management [26],security [17, 22], software caches [19], instruction set translation[6] and memory management [23, 27]. While DBT is beneficial in embedded systems, the use of the tech-nology has been limited in this domain due to tight constraints onmemory and performance. In particular, DBT systems typicallyemploy a software-managed memory buffer, called a fragmentcache (F$), to hold blocks of dynamically translated instructions(called fragments). To ensure low runtime overhead, the fragmentcache is relatively large to hold an application’s translated codeworking set, which avoids unnecessarily re-translating previouslyseen code. A typical F$ can be several megabytes in size, whichmay not fit in an embedded system’s limited memory resources. Many embedded systems, particularly those based on a system-on-a-chip (SoC), have a small on-chip scratchpad memory (SPM).The SPM may hold data, or possibly instructions. The advantage tothe scratchpad over external memory is its fast access time and lowpower consumption. A typical SoC also employs Flash memory aspermanent storage to hold application code. The Flash memory isunfortunately often quite slow and power hungry. When a programis executed, it is loaded into external main memory to minimizethe costs associated with the Flash memory. Due to its fast access and low power consumption, the scratchpadis potentially an appropriate resource to hold translated code in aDBT system (i.e., the fragment cache). However, the scratchpad ismuch smaller than the amount of space normally allocated to thefragment cache. If the F$ size is simply set to the scratchpad size,then the working set of the translated application code is unlikelyto fit. As a result, there will be many off-chip accesses to re-trans-late previously encountered instructions. The high cost of theseaccesses negates the benefit of the scratchpad’s fast access (andlow power consumption) for the fragment cache. In this paper, we propose a new approach to managing the F$ forembedded systems with SPM, external memory and Flash storage.The approach applies three novel management strategies to mini-mize the number of off-chip accesses to fetch and translate the pro-gram. First, the approach uses footprint reduction to minimize theamount of code that is generated by the dynamic translator toremain in control of the application. Next, the approach uses victimcompression to reduce the cost of re-translating application codethat may be evicted when the working set does not fit in the F$.Lastly, the approach uses fragment pinning to avoid evicting fre-quently executed fragments. We show that our techniques areeffective and allow the fragment cache to fit in the scratchpad. Fragment Cache Management for Dynamic Binary Translators in Embedded Systems with Scratchpad José Baiocchi†, Bruce R. Childers†, Jack W. Davidson‡, Jason D. Hiser‡, Jonathan Misurda† †Department of Computer ScienceUniversity of Pittsburgh{baiocchi, childers, jmisurda}@cs.pitt.edu‡Department of Computer ScienceUniversity of Virginia{jwd, hiser}@cs.virginia.eduThis paper makes several contributions, including: • Footprint reduction to minimize code expansion from the dynamic translator;• Victim compression to bypass re-fetching and re-translat-ing the code when it is needed again;• Fragment pinning to avoid unnecessarily evicting impor-tant and often needed code; and, • A thorough evaluation of our techniques in a simulated SoC with scratchpad, SDRAM and Flash memories. The paper is organized as follows. Section 2 describes the systemstargeted by our techniques and Section 3 investigates how the F$affects performance. Section 4 presents our techniques for the F$and Section 5 gives the overall improvement with these tech-niques. Section 6 describes related work and Section 7 concludes. 2. TARGET SYSTEMFigure 1 shows a canonical embedded system; this device is a sin-gle chip with a processor, L1 instruction (I-cache) and data (D-cache) caches, an application-specific integrated circuit (ASIC), ascratchpad (implemented as SRAM), ROM (implemented as asmall on-device Flash memory), a controller for external Flashmemory, a controller for external main memory (implemented asSDRAM) and off-chip I/O channels. The figure shows SDRAMand Flash memories, which are external to the device. TheSDRAM is main memory and holds application code and data. TheFlash memory is managed by the operating system (OS); it holdsuser files, including application binary images. Depending on the design, the boot-up and OS code may be in


Download Fragment Cache Management
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fragment Cache Management and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fragment Cache Management 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?