U of U CS 7810 - Lecture 9 - Large Cache Design II - D2836651

Home> Schools> University of Utah> Computer Science (CS) > CS 7810> Lecture 9 - Large Cache Design II

DOC PREVIEW

U of U CS 7810 - Lecture 9 - Large Cache Design II

School name University of Utah

Course Cs 7810- Advanced Computer Architecture

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Lecture 9: Large Cache Design II• Topics: Cache partitioning and replacement policies2Basic Replacement Policies• More reasonable options when considering the L2• LRU: least recently used• LFU: least frequently used (requires small saturating cntrs)• pseudo-LRU: organize ways as a tree and track whichsub-tree was last accessed• NRU: every block has a bit; the bit is reset to 0 upon touch;when evicting, pick a block with its bit set to 1; if noblock has a 1, make every bit 03Why the Basic Policies Fail• Access types that pollute the cache without yielding too many hits: streaming (no reuse), thrashing (distant reuse)• Current hit rates are far short of those with an oracularreplacement policy (Belady): evict the block whose nextaccess is most distant• A large fraction of the cache is useless – blocks that haveserviced their last hit and are on the slow walk from MRUto LRU4Insertion, Promotion, Victim Selection• Instead of viewing the set as a recency stack, simplyview it as a priority list; in LRU, priority = recency• When we fetch a block, it can be inserted in any positionin the list• When a block is touched, it can be promoted up the prioritylist in one of many ways• When a block must be victimized, can select any block(not necessarily the tail of the list)5MIP, LIP, BIP, and DIP Qureshi et al., ISCA’07• MIP: MRU insertion policy (the baseline)• LIP: LRU insertion policy; assumes that blocks are uselessand should be kept around only if touched twice insuccession• BIP: Bimodal insertion policy; put most blocks at the tail;with a small probability, insert at head; for thrashingworkloads, it can retain part of the working set andyield hits on them• DIP: Dynamic insertion policy: pick the better of MIP andBIP; decide with set-dueling6RRIP Jaleel et al., ISCA’10• Re-Reference Interval Prediction: in essence, insert blocksnear the end of the list than at the very end• Implement with a multi-bit version of NRU: zero counteron touch, evict block with max counter, else increment every counter by one• RRIP can be easily implemented by setting the initialcounter value to max-1 (does not require list management)7UCP Qureshi et al., MICRO’06• Utility Based Cache Partitioning: partition ways amongcores based on estimated marginal utility of each additionalway to each core• Each core maintains a shadow tag structure for the L2cache that is populated only by requests from this core;the core can now estimate hit rates if it had W ways of L2• Every epoch, stats are collected and ways re-assigned• Can reduce shadow tag storage overhead by usingset sampling and partial tags8TADIP Jaleel et al., PACT’08• Thread-aware DIP: each thread dynamically decides touse MIP or BIP; threads that use BIP get a smallerpartition of cache• Better than UCP because even for a thrashing workload,part of the working set gets to stay in cache• Need lots of set dueling monitors, but no need for extrashadow tags9PIPP Xie and Loh, ISCA’09• Promotion/Insertion pseudo partitioning: incoming blocksare inserted in arbitrary positions in the list and on everytouch, they are gradually promoted up the list with a givenprobability• Applications with a large partition are inserted near the headof the list and promoted aggressively• Partition sizes are decided with marginal utility estimates• In a few sets, a core gets to use N-1 ways and count hitsto each way; other threads only get to use the last way10Aggressor VT Liu and Yeung, PACT’09• In an oracle policy, 80% of the evictions belong to athrashing aggressor thread• Hence, if the LRU block belongs to an aggressor thread,evict it; else, evict the aggressor thread’s LRU block witha probability of either 99% or 50%• At the start of each phase change, sample behavior forthat thread in one of three modes: non-aggr, aggr-99%,aggr-50%; pick the best performing mode11Set Partitioning• Can also partition sets among cores by assigningpage colors to each core• Needs little hardware support, but must adapt todynamic arrival/exit of tasks12Overview13Title•

View Full Document