1Lecture 9: Large Cache Design II• Topics: Cache partitioning and replacement policies2Basic Replacement Policies• More reasonable options when considering the L2• LRU: least recently used• LFU: least frequently used (requires small saturating cntrs)• pseudo-LRU: organize ways as a tree and track whichsub-tree was last accessed• NRU: every block has a bit; the bit is reset to 0 upon touch;when evicting, pick a block with its bit set to 1; if noblock has a 1, make every bit 03Why the Basic Policies Fail• Access types that pollute the cache without yielding too many hits: streaming (no reuse), thrashing (distant reuse)• Current hit rates are far short of those with an oracularreplacement policy (Belady): evict the block whose nextaccess is most distant• A large fraction of the cache is useless – blocks that haveserviced their last hit and are on the slow walk from MRUto LRU4Insertion, Promotion, Victim Selection• Instead of viewing the set as a recency stack, simplyview it as a priority list; in LRU, priority = recency• When we fetch a block, it can be inserted in any positionin the list• When a block is touched, it can be promoted up the prioritylist in one of many ways• When a block must be victimized, can select any block(not necessarily the tail of the list)5MIP, LIP, BIP, and DIP Qureshi et al., ISCA’07• MIP: MRU insertion policy (the baseline)• LIP: LRU insertion policy; assumes that blocks are uselessand should be kept around only if touched twice insuccession• BIP: Bimodal insertion policy; put most blocks at the tail;with a small probability, insert at head; for thrashingworkloads, it can retain part of the working set andyield hits on them• DIP: Dynamic insertion policy: pick the better of MIP andBIP; decide with set-dueling6RRIP Jaleel et al., ISCA’10• Re-Reference Interval Prediction: in essence, insert blocksnear the end of the list than at the very end• Implement with a multi-bit version of NRU: zero counteron touch, evict block with max counter, else increment every counter by one• RRIP can be easily implemented by setting the initialcounter value to max-1 (does not require list management)7UCP Qureshi et al., MICRO’06• Utility Based Cache Partitioning: partition ways amongcores based on estimated marginal utility of each additionalway to each core• Each core maintains a shadow tag structure for the L2cache that is populated only by requests from this core;the core can now estimate hit rates if it had W ways of L2• Every epoch, stats are collected and ways re-assigned• Can reduce shadow tag storage overhead by usingset sampling and partial tags8TADIP Jaleel et al., PACT’08• Thread-aware DIP: each thread dynamically decides touse MIP or BIP; threads that use BIP get a smallerpartition of cache• Better than UCP because even for a thrashing workload,part of the working set gets to stay in cache• Need lots of set dueling monitors, but no need for extrashadow tags9PIPP Xie and Loh, ISCA’09• Promotion/Insertion pseudo partitioning: incoming blocksare inserted in arbitrary positions in the list and on everytouch, they are gradually promoted up the list with a givenprobability• Applications with a large partition are inserted near the headof the list and promoted aggressively• Partition sizes are decided with marginal utility estimates• In a few sets, a core gets to use N-1 ways and count hitsto each way; other threads only get to use the last way10Aggressor VT Liu and Yeung, PACT’09• In an oracle policy, 80% of the evictions belong to athrashing aggressor thread• Hence, if the LRU block belongs to an aggressor thread,evict it; else, evict the aggressor thread’s LRU block witha probability of either 99% or 50%• At the start of each phase change, sample behavior forthat thread in one of three modes: non-aggr, aggr-99%,aggr-50%; pick the best performing mode11Set Partitioning• Can also partition sets among cores by assigningpage colors to each core• Needs little hardware support, but must adapt todynamic arrival/exit of tasks12Overview13Title•
View Full Document