New version page

sacs07

Upgrade to remove ads

This preview shows page 1-2 out of 6 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

"Reconfigurable Split Data Caches: A Novel Scheme for Embedded Systems" Afrin Naz1 Krishna Kavi1 Juan Oh1 Pierfrancesco Foglia2 1Department of Computer Science and Engineering 2 Dept. Ingegneria della Informazione University of North Texas, Denton, TX 76203, USA University of Pisa, Diotisalvi I-56126PIS, Italy 940-565-2767 050-221-7530 email: {afrin, kavi, Juan}@cse.unt.edu email: [email protected] ABSTRACT This paper shows that even very small reconfigurable data caches, when split to serve data streams exhibiting temporal and spatial localities, can improve performance of embedded applications without consuming excessive silicon real estate or power. It also shows that neither higher set-associativities nor large block sizes are necessary with reconfigurable split cache organizations. We use benchmark programs from the MiBench suite to show that our cache organization outperforms an 8k unified data cache in terms of miss rates, access times, energy consumption and silicon area. Finally we show how the saved area can be utilized for supporting techniques for improving performance of embedded systems. Our design enables the cache to be divided into multiple partitions that can be used for different processor activities other than conventional caching. In this paper we have evaluated one of those options to support “prefetching”. Categories and Subject Descriptors C.1.1. [Processor Architectures]: Single Data Stream Architecture --- cache memories General Terms Performance, experimentation, measurement and Design. Keywords Embedded systems, Split cache, reconfigurability, locality, cache. 1. INTRODUCTION In today’s microprocessors, cache has become a vital element in improving performance over a wide range of applications. Studies have found that the on-chip cache is responsible for 50% of an embedded processor’s total power dissipation [3,5,16]. For that reason we feel that it is worthwhile investigating new reconfigurable cache organizations to address both performance and the power requirements. The performance of a given cache architecture is largely determined by the behavior of the applications. Unfortunately the manufacturer typically sets the cache architecture as a compromise across several applications. This leads to conflicts in deciding on total cache size, line size and associativity. For embedded systems where everything needs to be cost effective, this “one-size-fits-all” design philosophy is not adequate. In this paper we apply reconfigurability to the design of caches that address these conflicting requirements and explore how to design caches that achieve high performance for embedded applications while remaining both energy and area efficient. The key contributions of this work are the following. First, we introduce a novel cache architecture for embedded microprocessor platforms. This proposed cache will detect program access patterns and fine-tune cache policies to improve both data localities and the overall cache performance for embedded applications. Second, our design enables the cache (as we save area) to be divided into multiple partitions that can be used for purposes other than conventional caching. In this paper we evaluate our cache architecture that uses reconfigurability coupled with split data caches (separate array and scalar data caches) complemented by a very small victim cache. Our goal is to reduce (silicon) area, access time, and dynamic power consumed by cache memories while retaining performance gains. In our design, we address the problem of improving cache performance in embedded systems through the use of separate array and scalar data caches. Then we further extend our architecture by augmenting the scalar cache with a victim cache [19]. Victim caches are based on the fact that reducing cache misses due to line conflicts for data exhibiting temporal locality is an effective way of improving cache performance, without increasing the overall cache associativity. In this paper we also study how our cache organizations can be reconfigured based on an application’s behavior. By setting a few bits in a configuration register, the cache can be configured by software for optimum sizes for each of our three structures (array cache, scalar cache, victim cache) and use the rest of the unused area for other processor activities. The cache system can also be configured to shutdown certain regions in order to effectively reduce energy consumption. For both cases, the reconfiguration leads to only a small overhead in terms of time, power, silicon area and hardware complexity. In this paper, we provide the details of our configurable cache. When using our augmented split caches for embedded applications, our results Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’07, March 11-15, 2007, Seoul, Korea. Copyright 2007 ACM 1-59593-480-4/07/0003…$5.00.show excellent reductions in both memory size and memory access time, translating into reduced power consumption. Our cache architecture reduces the cache area by as much as 78%, execution time by as much as 55%, and energy consumption by as much as 67%, when compared with an 8k byte direct-mapped unified data cache with a 32k byte level-2 cache. If we consider tradeoffs in performance improvement, we can achieve as much as a 83% reduction in area consumption (without any increase in execution time) and by as much as a 61% decrease in execution cycles (without any increase in silicon area). These reductions can be profound when working with small L-1 caches often found in embedded systems. The space savings resulting from our cache


Download sacs07
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view sacs07 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view sacs07 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?