CMU CS 15740 - Non-vital Loads - D2964910

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15740> Non-vital Loads

DOC PREVIEW

CMU CS 15740 - Non-vital Loads

School name Carnegie Mellon University

Course Cs 15740- Computer Architecture

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Non vital Loads Ryan Rakvic Bryan Black Deepak Limaye John P Shen Microprocessor Research Lab Intel Labs Electrical and Computer Engineering Carnegie Mellon University ryan n rakvic bryan black john shen intel com dlimaye ece cmu edu Abstract As the frequency gap between main memory and modern microprocessor grows the implementation and efficiency of on chip caches become more important The growing latency to memory is motivating new research into load instruction behavior and selective data caching This work investigates the classification of load instruction behavior A new load classification method is proposed that classifies loads into those vital to performance and those not vital to performance A limit study is presented to characterize different types of non vital loads and to quantify the percentage of loads that are non vital Finally a realistic implementation of the non vital load classification method is presented and a new cache structure called the Vital Cache is proposed to take advantage of non vital loads The Vital Cache caches data for vital loads only deferring non vital loads to slower caches Results The limit study shows 75 of all loads are non vital with only 35 of the accessed data space being vital for caching The Vital Cache improves the efficiency of the cache hierarchy and the hit rate for vital loads The Vital Cache increases performance by 17 1 Introduction The latency to main memory is quickly becoming the single most significant bottleneck to microprocessor performance In response to long latency memory on chip cache hierarchies are becoming very large However the firstlevel data cache DL1 is limited in size by the short latency it must have to keep up with the microprocessor core For an on chip cache to continue as an effective mechanism to counter long latency memory DL1 caches must remain small fast and become more storage efficient A key problem is that microprocessors treat all load instructions equally They are fetched in program order and executed as quickly as possible As soon as all load source operands are valid loads are issued to load functional units for immediate execution All loads access the first level of data cache and advance through the memory hierarchy until the desired data is found Treating all loads equally implies that all target data are vying for positions in each level of the memory hierarchy regardless of the importance vitality of that data As demonstrated by Srinivasan and Lebeck 22 not all loads are equally important In fact many have significant tolerance for execution latency Our work proposes a new classification of load instructions and a new caching method to take advantage of this load classification We argue that load instructions should not be treated equally because many loads need not be executed as quickly as possible This work presents two contributions 1 We perform a limit study analyzing the classification of load instructions as vital important or non vital not important Vital loads are loads that must be executed as quickly as possible in order to avoid performance degradation Non vital loads are loads that can be delayed without impacting performance 2 We introduce a new cache called the Vital Cache to selectively cache data only for vital loads The vital cache improves performance by increasing the efficiency of the fastest cache in the hierarchy The hit rate for vital loads is increased at the expense of non vital loads which can tolerate longer access latencies without impacting performance Performance is also increased by processing scheduling the vital loads ahead of non vital loads 2 Previous Work In 1 the predictability of load latencies is addressed 15 showed some effects of memory latencies but it was 21 22 to first identify the latency tolerance of loads exhibited by a microprocessor These works show that loads Proceedings of the Eighth International Symposium on High Performance Computer Architecture HPCA 02 1503 0897 02 17 00 2002 IEEE leading to mispredicted branches or to a slowing down of the machine are loads that are critical This work is built on the same concept as 21 22 In fact part of our classification lead to branch see Section 4 is taken from this previous research This work further identifies additional classes of loads and uses a different classification algorithm Furthermore we introduce a new caching mechanism to take advantage of them The work in 21 introduced an implementation based on the non critical aspect of loads They implemented two different approaches using a victim critical cache and prefetching critical data Neither seemed to show much performance benefit The work in 7 also introduced a buffer containing non critical addresses The implementation in Section 5 is based on the same spirit of 7 21 but is done in accordance with non vital loads Section 5 introduces a form of selective vital caching This selective caching is similar in concept to 9 10 14 19 25 The goal of selective caching is to improve the efficiency of the cache 9 10 cached data based on temporal reuse 25 selectively cached data based on the address of loads In particular loads which typically hit the cache are given priority to use the cache We also propose caching data based on the address of loads However we cache data based on the vitality or importance of the load instruction The non vital concept should not be confused with critical path 24 research Non vital loads may or may not be on the critical path of execution Non vital loads become nonvital based on resource constraints and limitations Therefore a load that is considered non vital may be on the critical path but its execution latency is not vital to overall performance 6 introduced a new insightful critical path model that takes into account resource constraints 6 used a token passing method to try to identify instructions that are critical to performance On the other hand our approach attempts to identify the loads that are not critical to performance and therefore do not need DL1 cache hits to maintain high performance Other popular research tries to design a DL1 that maintains a high hit rate with very low latency 8 One approach used streaming buffers victim caches 13 alternative cache indexing schemes 20 etc 10 Another approach attempts to achieve free associativity Calder et al 4 following the spirit of 12 11 2 17 proposed the predictive sequential associative cache PSA cache to implement associative caches with a serial lookup

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

CMU CS 15740 - Non-vital Loads

Sign up for free to view:

Please select your school