DOC PREVIEW
CMU CS 15740 - Non-vital Loads

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

AbstractAs the frequency gap between main memory and modernmicroprocessor grows, the implementation and efficiencyof on-chip caches become more important. The growing la-tency to memory is motivating new research into load in-struction behavior and selective data caching. This workinvestigates the classification of load instruction behavior.A new load classification method is proposed that classi-fies loads into those vital to performance and those not vi-tal to performance. A limit study is presented tocharacterize different types of non-vital loads and to quan-tify the percentage of loads that are non-vital. Finally, a re-alistic implementation of the non-vital load classificationmethod is presented and a new cache structure called theVital Cache is proposed to take advantage of non-vitalloads. The Vital Cache caches data for vital loads only, de-ferring non-vital loads to slower caches.Results: The limit study shows 75% of all loads are non-vi-tal with only 35% of the accessed data space being vital forcaching. The Vital Cache improves the efficiency of thecache hierarchy and the hit rate for vital loads. The VitalCache increases performance by 17%.1 IntroductionThe latency to main memory is quickly becoming the sin-gle most significant bottleneck to microprocessor perfor-mance. In response to long-latency memory, on-chip cachehierarchies are becoming very large. However, the first-level data cache (DL1) is limited in size by the short laten-cy it must have to keep up with the microprocessor core.For an on-chip cache to continue as an effective mecha-nism to counter long latency memory, DL1 caches must re-main small, fast and become more storage efficient. A key problem is that microprocessors treat all load in-structions equally. They are fetched in program order andexecuted as quickly as possible. As soon as all load sourceoperands are valid, loads are issued to load functional unitsfor immediate execution. All loads access the first level ofdata cache and advance through the memory hierarchy un-til the desired data is found. Treating all loads equally im-plies that all target data are vying for positions in each levelof the memory hierarchy regardless of the importance (vi-tality) of that data. As demonstrated by Srinivasan and Lebeck [22] not allloads are equally important. In fact, many have significanttolerance for execution latency. Our work proposes a newclassification of load instructions and a new caching meth-od to take advantage of this load classification. We arguethat load instructions should not be treated equally becausemany loads need not be executed as quickly as possible. This work presents two contributions. 1) We perform a lim-it study analyzing the classification of load instructions asvital (important) or non-vital (not important). Vital loadsare loads that must be executed as quickly as possible in or-der to avoid performance degradation. Non-vital loads areloads that can be delayed without impacting performance.2) We introduce a new cache called the Vital Cache to se-lectively cache data only for vital loads. The vital cache im-proves performance by increasing the efficiency of thefastest cache in the hierarchy. The hit rate for vital loads isincreased at the expense of non-vital loads, which can tol-erate longer access latencies without impacting perfor-mance. Performance is also increased by processing(scheduling) the vital loads ahead of non-vital loads.2 Previous WorkIn [1] the predictability of load latencies is addressed. [15]showed some effects of memory latencies, but it was[21][22] to first identify the latency tolerance of loads ex-hibited by a microprocessor. These works show that loadsNon-vital Loads†Ryan Rakvic, †Bryan Black, ‡Deepak Limaye, & †John P. Shen †Microprocessor Research LabIntel Labs{ryan.n.rakvic,bryan.black,john.shen}@intel.com‡Electrical and Computer EngineeringCarnegie Mellon [email protected] of the Eighth International Symposium on High-Performance Computer Architecture (HPCA’02) 1503-0897/02 $17.00 © 2002 IEEEleading to mispredicted branches or to a slowing down ofthe machine are loads that are critical. This work is built onthe same concept as [21][22]. In fact, part of our classifica-tion (lead to branch, see Section 4) is taken from this previ-ous research. This work further identifies additional classesof loads and uses a different classification algorithm. Fur-thermore, we introduce a new caching mechanism to takeadvantage of them. The work in [21] introduced an implementation based onthe non-critical aspect of loads. They implemented two dif-ferent approaches: using a victim critical cache, andprefetching critical data. Neither seemed to show muchperformance benefit. The work in [7] also introduced abuffer containing non-critical addresses. The implementa-tion in Section 5 is based on the same spirit of [7][21], butis done in accordance with non-vital loads. Section 5 introduces a form of selective vital caching. Thisselective caching is similar in concept to[9][10][14][19][25]. The goal of selective caching is to im-prove the efficiency of the cache. [9][10] cached data basedon temporal reuse. [25] selectively cached data based onthe address of loads. In particular, loads which typically hitthe cache are given priority to use the cache. We also pro-pose caching data based on the address of loads. However,we cache data based on the vitality or importance of theload instruction.The non-vital concept should not be confused with “criticalpath” [24] research. Non-vital loads may or may not be onthe critical path of execution. Non-vital loads become non-vital based on resource constraints and limitations. There-fore, a load that is considered “non-vital” may be on thecritical path, but its execution latency is not vital to overallperformance. [6] introduced a new insightful critical pathmodel that takes into account resource constraints. [6] useda token-passing method to try to identify instructions thatare critical to performance. On the other hand, our ap-proach attempts to identify the loads that are not critical toperformance and therefore do not need DL1 cache hits tomaintain high performance. Other popular research tries to design a DL1 that maintainsa high hit rate with very low latency[8]. One approach usedstreaming buffers, victim caches [13], alternative cache in-dexing schemes [20], etc. [10]. Another approach attemptsto achieve free associativity. Calder et


View Full Document

CMU CS 15740 - Non-vital Loads

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Non-vital Loads
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Non-vital Loads and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Non-vital Loads 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?