CS 152 Computer Architecture and Engineering Lecture 20 Snoopy Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California Berkeley http www eecs berkeley edu krste http inst cs berkeley edu cs152 Recap Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency arbitrary order preserving interleaving of memory references of sequential programs 4 22 2008 CS152 Spring 08 2 Recap Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies What are these in our example T1 T2 Store X 1 X 1 Y Store Y 11 Y 11 Y R1 Y Y Load R1 Store Load R2 X Store X R2 additional SC requirements X X 4 22 2008 CS152 Spring 08 3 Recap Mutual Exclusion and Locks Want to guarantee only one process is active in a critical section Blocking atomic read modify write instructions e g Test Set Fetch Add Swap vs Non blocking atomic read modify write instructions e g Compare Swap Load reserve Storeconditional vs Protocols based on ordinary Loads and Stores 4 22 2008 CS152 Spring 08 4 Issues in Implementing Sequential Consistency P P P P P P M Implementation of SC is complicated by two issues Out of order execution capability Load a Load b yes Load a Store b yes if a b Store a Load b yes if a b Store a Store b yes if a b Caches Caches can prevent the effect of a store from being seen by other processors SC complications motivates architects to consider weak or relaxed memory models 4 22 2008 CS152 Spring 08 5 Memory Fences Instructions to sequentialize memory accesses Processors with relaxed or permit Loads and Stores to reordered need to provide to force the serialization weak memory models i e different addresses to be memory fence instructions of memory accesses Examples of processors with relaxed memory models Sparc V8 TSO PSO Membar Sparc V9 RMO Membar LoadLoad Membar LoadStore Membar StoreLoad Membar StoreStore PowerPC WO Sync EIEIO Memory fences are expensive operations however one pays the cost of serialization only when it is required 4 22 2008 CS152 Spring 08 6 Using Memory Fences Producer tail head Consumer Rtail Rtail Rhead R Consumer Producer posting Item x Load Rhead head Load Rtail tail spin Load Rtail tail Store Rtail x if Rhead Rtail goto spin MembarSS MembarLL Rtail Rtail 1 Load R Rhead Store tail Rtail Rhead Rhead 1 ensures that tail ptr ensures that R is Store head Rhead is not updated before not loaded before process R x has been stored x has been stored 4 22 2008 CS152 Spring 08 7 Memory Consistency in SMPs CPU 1 A 100 CPU 2 cache 1 A 100 cache 2 CPU Memory bus A 100 memory Suppose CPU 1 updates A to 200 write back memory and cache 2 have stale value write through cache 2 has a stale value Do these stale values matter What is the view of shared memory for programming 4 22 2008 CS152 Spring 08 8 Write back Caches SC prog T1 cache 1 X 1 ST X 1 Y 11 ST Y 11 T1 is executed cache 1 writes back X 0 Y 11 X Y Y Y X X X 1 Y 11 X 0 Y 11 X Y X 1 Y 11 X Y Y Y X X Y Y X X X 1 Y 11 X 0 Y 11 Y 11 Y 11 X 0 X 0 cache 1 writes back X 4 22 2008 cache 2 Y Y X X X 1 Y Y 11 T2 executed cache 2 writes back X Y memory X 0 Y 10 X Y X 1 Y 11 X 1 Y 11 CS152 Spring 08 11 11 0 0 11 11 0 0 prog T2 LD Y R1 ST Y R1 LD X R2 ST X R2 t n e t s i s n o c in 9 Write through Caches SC prog T1 ST X 1 ST Y 11 T1 executed T2 executed 4 22 2008 cache 1 X 0 Y 10 memory X 0 Y 10 X Y cache 2 Y Y X 0 X X 1 Y 11 X 1 Y 11 X Y Y Y X 0 X X 1 Y 11 X 1 Y 11 X 0 Y 11 Y Y X X prog T2 LD Y R1 ST Y R1 LD X R2 ST X R2 11 11 0 0 Write through caches don t preserve sequential consistency either CS152 Spring 08 10 Maintaining Sequential Consistency SC is sufficient for correct producer consumer and mutual exclusion code e g Dekker Multiple copies of a location in various caches can cause SC to break down Hardware support is required such that only one processor at a time has write permission for a location no processor can load a stale copy of the location after a write 4 22 2008 cache coherence protocols CS152 Spring 08 11 Cache Coherence Protocols for SC write request the address is invalidated updated in all other caches before after the write is performed read request if a dirty copy is found in some cache a write back is performed before the memory is read 4 22 2008 We will focus on Invalidation CS152 Spring 08 12 Warmup Parallel I O Memory Bus Address A Proc Data D Physical Memory Cache R W Page transfers occur while the Processor is running Either Cache or DMA can be the Bus Master and effect transfers A D R W DMA DISK DMA stands for Direct Memory Access 4 22 2008 CS152 Spring 08 13 Problems with Parallel I O Cached portions of page Proc Memory Bus Physical Memory Cache DMA transfers DMA DISK Memory Disk Physical memory may be stale if Cache copy is dirty Disk and not 4 22 2008 Memory Cache may hold state data see memory writes CS152 Spring 08 14 Snoopy Cache Goodman 1983 Idea Have cache watch or snoop upon DMA transfers and then do the right thing Snoopy cache tags are dual ported Used to drive Memory Bus when Cache is Bus Master A Proc R W Tags and State D Data lines A R W Snoopy read port attached to Memory Bus Cache 4 22 2008 CS152 Spring 08 15 Snoopy Cache Actions for DMA Observed Bus Cycle Cache State Cache Action Address not cached DMA Read Memory Cached unmodified Disk Cached modified Address not cached DMA Write Cached unmodified Disk Cached modified 4 22 2008 Memory CS152 Spring 08 16 CS152 Administrivia Quiz 5 Thursday April 24 Covers lectures 16 17 18 PS 5 Lab 5 4 22 2008 CS152 Spring 08 17 Shared Memory Multiprocessor Memory Bus M1 Snoopy Cache M2 Snoopy Cache M3 Snoopy Cache Physical Memory DMA DISKS Use snoopy mechanism to keep all processors view of memory …
View Full Document
Unlocking...