DOC PREVIEW
TLB Compression

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Enhancing TLB Reach with Ternary-CAM Cells Anuj Kumar and Rabi N Mahapatra Department of Computer Science Texas A&M University, College Station, TX 77843 {anujk,rabi}@cs.tamu.edu Abstract In this paper, we propose dynamic aggregation of virtual tags in TLB to increase its cov-erage and improve the overall miss ratio during address translation. Dynamic aggregation exploits both the spatial and temporal locality inherent in most application programs. To support dynamic aggregation, we introduce the use of ternary-CAM (TCAM) cells at the second-level TLB. The modified TLB architecture results in an increase of TLB reach without additional CAM entries. We also adopt bulk prefetching concurrently with aggregation technique to enhance the benefits due to spatial locality. The performance of the proposed TLB architecture is evaluated using SPEC2000 benchmarks concentrating on those that show high data TLB miss ratios. Simulation results indicate a reduction in miss ratios between 59% and 99.99% for all the considered bench-marks except for one benchmark, which has a reduction of 10%. We show that the L2 TLB up-dated with a few TCAM cells is an attractive solution to high miss ratios exhibited by applica-tions. Keywords: TLB, Ternary-CAM, Miss-ratio, Aggregation, Prefetching 1. Introduction The address translation is one of the most critical operations in modern processors. Previ-ous studies show the most frequent kernel operation is TLB miss handling leading to significant processing time being spent to handle the TLB misses [2], [5], [7]. Due to the continuous increase in instruction level parallelism, clock frequency, and size of working set of applications, the im-pact of TLB performance on the overall application processing time will continue to grow.2 The performance of any TLB design is evaluated in terms of two metrics - access time and miss ratio. Since the address translation is in the critical path of instruction fetch and data reference, the TLB look up needs to be completed as soon as possible. The TLB miss rate should be minimized because of severe miss penalty (30-50 clock cycles). This penalty is sure to in-crease further due to the ever-increasing gap between the processor and the memory access speed. Consistent efforts are being made to reduce the TLB miss rate. Two popular approaches to de-crease TLB miss rate that have emerged in the research community are increasing TLB reach and prefetching. Use of superpages has been proposed to increase the TLB reach. The subblock TLB [19] and use of shadow memory [8], [14], [18] are some of the schemes that implement super-pages. These schemes either place a considerable overhead on the memory management unit of the operating system or require significant architectural changes. Even though TLBs of most modern processors support multiple page sizes, the use of superpages in prevalent operating sys-tems is rather limited due to their associated complexities. The subblock TLB further has a limita-tion of supporting only fixed sized superpages. Recency based prefetching [17] and distance pre-fetching [9] have been proposed to reduce miss rate but at the expense of complex prediction strategies. We show that majority of SPEC2000 benchmarks exhibit a considerable amount of spa-tial locality at the page level granularity and have a large scope of aggregation. Aggregation is the process of merging several virtual tags in TLB into a single entry thus freeing many TLB entries so that they can be filled up by other address translations. Based on these observations, we pro-pose a modified TLB architecture that supports bulk prefetching and static aggregation similar to the concept of complete-subblock proposed in [19]. Further, we employ dynamic aggregation of virtual tags in the L2 TLB to increase TLB reach. To support dynamic aggregation, we introduce the use of TCAM cells [21] along with CAM cells at the second-level TLB. To the best of our knowledge, this is the first attempt to use TCAM cells in a TLB design to enhance TLB reach. We show that dynamic aggregation not only exploits spatial locality but also exploits temporal3 locality present in the applications. A significant reduction in the overall TLB miss rate is achieved without additional CAM overhead (i.e., no change in the number of virtual tags in the TLB). The aggregation technique proposed here is a hardware-controlled approach. Each virtual tag in TLB maps to a variable number of physical pages due to the use of TCAM cells. In addi-tion, the TCAM cells allow mapping of larger sized pages (e.g., Itanium CPU provides ten differ-ent page sizes from 4KB to 256MB) in a single TLB entry. Thus, the proposed scheme provides the benefits of increasing the TLB reach as in the case of superpages but without most of the limi-tations and associated complexities. The rest of the paper is organized as follows. Section 2 discusses related work. The de-tails of experimental setup are presented in Section 3. Section 4 describes some of our observa-tions regarding L2 TLB. Bulk prefetching and static aggregation strategies are described in Sec-tion 5. Section 6 introduces the dynamic aggregation process in detail. The timing analysis of the proposed architecture is given in Section 7. Section 8 describes the usage of TCAM cells in the TLB to support variable sized pages. Section 9 concludes the paper with important results and a discussion on future research directions. 2. Background and Related Work Over the past few years, lot of research has been done to improve TLB access time and miss rate. Most common solution to reduce the access time is to support multi-level TLB with a small L1 TLB followed by a larger L2 TLB [7], [11]. The second-level TLB is looked up only when there is a L1 TLB miss. Looking up a smaller TLB decreases the access time. Like the cache hierarchical structure, TLB hierarchy also follows the inclusion property (entries present in lower level is also present in higher level). Two popular approaches to decrease TLB miss rate are increasing TLB reach and pre-fetching. TLB reach is the total physical memory mapped in the TLB that is equal to the product4 of number of entries in the TLB and the page size of each entry. The simple solutions to increase TLB reach are increase in page size and number of TLB entries but both of them are not practical. 2.1.


TLB Compression

Download TLB Compression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view TLB Compression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view TLB Compression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?