CS 152 Computer Architecture and Engineering Lecture 20 Snoopy Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California Berkeley http www eecs berkeley edu krste http inst cs berkeley edu cs152 April 15 2010 CS152 Spring 2010 Recap Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency arbitrary order preserving interleaving of memory references of sequential programs April 15 2010 CS152 Spring 2010 2 Recap Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies What are these in our example T1 Store X 1 X 1 Store Y 11 Y 11 T2 Load R1 Y Store Y R1 Y Y Load R2 X Store X R2 X X additional SC requirements April 15 2010 CS152 Spring 2010 3 Recap Mutual Exclusion and Locks Want to guarantee only one process is active in a critical section Blocking atomic read modify write instructions e g Test Set Fetch Add Swap vs Non blocking atomic read modify write instructions e g Compare Swap Load reserve Store conditional vs Protocols based on ordinary Loads and Stores April 15 2010 CS152 Spring 2010 4 Issues in Implementing Sequential Consistency P P P P P P M Implementation of SC is complicated by two issues Out of order execution capability Load a Load b yes Load a Store b yes if a b Store a Load b yes if a b Store a Store b yes if a b Caches Caches can prevent the effect of a store from being seen by other processors SC complications motivate architects to consider weak or relaxed memory models April 15 2010 CS152 Spring 2010 5 Memory Fences Instructions to sequentialize memory accesses Processors with relaxed or weak memory models i e permit Loads and Stores to different addresses to be reordered need to provide memory fence instructions to force the serialization of memory accesses Examples of processors with relaxed memory models Sparc V8 TSO PSO Membar Sparc V9 RMO Membar LoadLoad Membar LoadStore Membar StoreLoad Membar StoreStore PowerPC WO Sync EIEIO Memory fences are expensive operations however one pays the cost of serialization only when it is required April 15 2010 CS152 Spring 2010 6 Using Memory Fences Producer tail head Consumer Rtail Rtail Rhead R Consumer Load Rhead head spin Load Rtail tail if Rhead Rtail goto spin MembarLL Load R Rhead Rhead Rhead 1 Store head Rhead ensures that R is process R not loaded before x has been stored Producer posting Item x Load Rtail tail Store Rtail x MembarSS Rtail Rtail 1 Store tail Rtail ensures that tail ptr is not updated before x has been stored April 15 2010 CS152 Spring 2010 7 Memory Coherence in SMPs CPU 1 A CPU 2 cache 1 100 A 100 cache 2 CPU Memory bus A 100 memory Suppose CPU 1 updates A to 200 write back memory and cache 2 have stale values write through cache 2 has a stale value Do these stale values matter What is the view of shared memory for programming April 15 2010 CS152 Spring 2010 8 Write back Caches SC T1 is executed prog T1 ST X 1 ST Y 11 cache 1 writes back Y cache 1 writes back X April 15 2010 X 1 Y 11 X 1 Y 11 T2 executed cache 2 writes back X Y cache 1 X 1 Y 11 X 1 Y 11 X 1 Y 11 memory X 0 Y 10 X Y X 0 Y 11 X Y X 0 Y 11 X Y X 1 Y 11 X Y cache 2 Y Y X X Y Y X X Y 11 Y 11 X 0 X 0 Y 11 Y 11 X 0 X 0 X 1 Y 11 X 0 Y 11 Y 11 Y 11 X 0 X 0 CS152 Spring 2010 prog T2 LD Y R1 ST Y R1 LD X R2 ST X R2 9 Write through Caches SC prog T1 ST X 1 ST Y 11 T1 executed T2 executed cache 1 X 0 Y 10 memory X 0 Y 10 X Y cache 2 Y Y X 0 X X 1 Y 11 X 1 Y 11 X Y Y Y X 0 X X 1 Y 11 X 1 Y 11 X 0 Y 11 Y 11 Y 11 X 0 X 0 prog T2 LD Y R1 ST Y R1 LD X R2 ST X R2 Write through caches don t preserve sequential consistency either April 15 2010 CS152 Spring 2010 10 Cache Coherence vs Memory Consistency A cache coherence protocol ensures that all writes by one processor are eventually visible to other processors i e updates are not lost A memory consistency model gives the rules on when a write by one processor can be observed by a read on another Equivalently what values can be seen by a load A cache coherence protocol is not enough to ensure sequential consistency But if sequentially consistent then caches must be coherent Combination of cache coherence protocol plus processor memory reorder buffer implements a given machine s memory consistency model April 15 2010 CS152 Spring 2010 11 Maintaining Cache Coherence Hardware support is required such that only one processor at a time has write permission for a location no processor can load a stale copy of the location after a write cache coherence protocols April 15 2010 CS152 Spring 2010 12 Warmup Parallel I O Memory Bus Address A Proc Data D Physical Memory Cache R W Page transfers occur while the Processor is running Either Cache or DMA can be the Bus Master and effect transfers A D R W DMA DISK DMA stands for Direct Memory Access means the I O device can read write memory autonomous from the CPU April 15 2010 CS152 Spring 2010 13 Problems with Parallel I O Cached portions of page Physical Memory Memory Bus Proc Cache DMA transfers DMA DISK Memory Disk April 15 2010 Disk Physical memory may be stale if cache copy is dirty Memory Cache may hold stale data and not see memory writes CS152 Spring 2010 14 Snoopy Cache Goodman 1983 Idea Have cache watch or snoop upon DMA transfers and then do the right thing Snoopy cache tags are dual ported Used to drive Memory Bus when Cache is Bus Master A Proc R W Tags and State D Data lines A R W Snoopy read port attached to Memory Bus Cache April 15 2010 CS152 Spring 2010 15 Snoopy Cache Actions for DMA Observed Bus Cycle Cache State Cache Action Address not cached No action Cached unmodified No action Cached modified Cache intervenes Address not cached No action DMA Write Cached unmodified Cache purges its copy Disk Cached modified DMA Read Memory Disk Memory April 15 2010 CS152 Spring 2010 16 CS152 Administrivia April 15 2010 CS152 Spring 2010 17 …
View Full Document
Unlocking...