CS 152 Computer Architecture and Engineering Lecture 20 Snoopy Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California Berkeley http www eecs berkeley edu krste http inst cs berkeley edu cs152 Recap Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency arbitrary order preserving interleaving of memory references of sequential programs 4 21 2009 CS152 Spring 09 2 Recap Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies What are these in our example T1 T2 Store X 1 X 1 Store Y 11 Y 11 Load R1 Y Store Y R1 Y Y Load R2 X Store X R2 X X additional SC requirements 4 21 2009 CS152 Spring 09 3 Recap Mutual Exclusion and Locks Want to guarantee only one process is active in a critical section Blocking atomic read modify write instructions e g Test Set Fetch Add Swap vs Non blocking atomic read modify write instructions e g Compare Swap Load reserve Store conditional vs Protocols based on ordinary Loads and Stores 4 21 2009 CS152 Spring 09 4 Issues in Implementing Sequential Consistency P P P P P P M Implementation of SC is complicated by two issues Out of order execution capability Load a Load b yes Load a Store b yes if a b Store a Load b yes if a b Store a Store b yes if a b Caches Caches can prevent the effect of a store from being seen by other processors SC complications motivates architects to consider weak or relaxed memory models 4 21 2009 CS152 Spring 09 5 Memory Fences Instructions to sequentialize memory accesses Processors with relaxed or weak memory models i e permit Loads and Stores to different addresses to be reordered need to provide memory fence instructions to force the serialization of memory accesses Examples of processors with relaxed memory models Sparc V8 TSO PSO Membar Sparc V9 RMO Membar LoadLoad Membar LoadStore Membar StoreLoad Membar StoreStore PowerPC WO Sync EIEIO Memory fences are expensive operations however one pays the cost of serialization only when it is required 4 21 2009 CS152 Spring 09 6 Using Memory Fences Producer tail head Consumer Rtail Rtail Rhead R Consumer Producer posting Item x Load Rhead head Load Rtail tail spin Load Rtail tail Store Rtail x if Rhead Rtail goto spin MembarSS MembarLL Rtail Rtail 1 Load R Rhead Store tail Rtail Rhead Rhead 1 ensures that tail ptr ensures that R is Store head R is not updated before head not loaded before process R x has been stored x has been stored 4 21 2009 CS152 Spring 09 7 Memory Consistency in SMPs CPU 1 A CPU 2 cache 1 100 A 100 cache 2 CPU Memory bus A 100 memory Suppose CPU 1 updates A to 200 write back memory and cache 2 have stale values write through cache 2 has a stale value Do these stale values matter What is the view of shared memory for programming 4 21 2009 CS152 Spring 09 8 Write back Caches SC T1 is executed prog T1 ST X 1 ST Y 11 cache 1 writes back Y T2 executed cache 1 writes back X cache 2 writes back X Y 4 21 2009 cache 1 X 1 Y 11 memory X 0 Y 10 X Y X 1 Y 11 X 0 Y 11 X Y Y Y X X X 1 Y 11 X 0 Y 11 X Y X 1 Y 11 X Y Y 11 Y 11 X 0 X 0 Y 11 Y 11 X 0 X 0 X 1 Y 11 X 0 Y 11 Y 11 Y 11 X 0 X 0 X 1 Y 11 X 1 Y 11 CS152 Spring 09 cache 2 Y Y X X prog T2 LD Y R1 ST Y R1 LD X R2 ST X R2 t n e t s i s n o c in 9 Write through Caches SC prog T1 ST X 1 ST Y 11 T1 executed T2 executed cache 1 X 0 Y 10 memory X 0 Y 10 X Y cache 2 Y Y X 0 X X 1 Y 11 X 1 Y 11 X Y Y Y X 0 X X 1 Y 11 X 1 Y 11 X 0 Y 11 Y 11 Y 11 X 0 X 0 prog T2 LD Y R1 ST Y R1 LD X R2 ST X R2 Write through caches don t preserve sequential consistency either 4 21 2009 CS152 Spring 09 10 Maintaining Sequential Consistency SC is sufficient for correct producer consumer and mutual exclusion code e g Dekker Multiple copies of a location in various caches can cause SC to break down Hardware support is required such that only one processor at a time has write permission for a location no processor can load a stale copy of the location after a write cache coherence protocols 4 21 2009 CS152 Spring 09 11 Cache Coherence Protocols for SC write request the address is invalidated updated in all other caches before after the write is performed read request if a dirty copy is found in some cache a writeback is performed before the memory is read We will focus on Invalidation protocols as opposed to Update protocols 4 21 2009 CS152 Spring 09 12 Warmup Parallel I O Memory Bus Address A Proc Data D Physical Memory Cache R W Page transfers occur while the Processor is running Either Cache or DMA can be the Bus Master and effect transfers A D R W DMA DISK DMA stands for Direct Memory Access 4 21 2009 CS152 Spring 09 13 Problems with Parallel I O Cached portions of page Memory Bus Proc Physical Memory Cache DMA transfers DMA DISK Memory Disk 4 21 2009 Disk Physical memory may be stale if Cache copy is dirty Memory Cache may hold state data and not see memory writes CS152 Spring 09 14 Snoopy Cache Goodman 1983 Idea Have cache watch or snoop upon DMA transfers and then do the right thing Snoopy cache tags are dual ported Used to drive Memory Bus when Cache is Bus Master A Proc R W Tags and State D Data lines A R W Snoopy read port attached to Memory Bus Cache 4 21 2009 CS152 Spring 09 15 Snoopy Cache Actions for DMA Observed Bus Cycle Cache State Cache Action Address not cached No action Cached unmodified No action Cached modified Cache intervenes Address not cached No action DMA Write Cached unmodified Cache purges its copy Disk Cached modified DMA Read Memory 4 21 2009 Disk Memory CS152 Spring 09 16 CS152 Administrivia Quiz 5 Thursday April 23 Covers VLIW Vector Multithreaded 4 21 2009 CS152 Spring 09 17 Shared Memory Multiprocessor Memory Bus M1 Snoopy Cache M2 Snoopy Cache M3 Snoopy Cache …
View Full Document
Unlocking...