April 15, 2010 CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.cs.berkeley.edu/~cs152April 15, 2010 CS152, Spring 2010 2 Recap: Sequential Consistency A Memory Model “ A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in the order specified by the program” Leslie Lamport Sequential Consistency = arbitrary order-preserving interleaving of memory references of sequential programs M P P P P P PApril 15, 2010 CS152, Spring 2010 3 Recap: Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies ( ) What are these in our example ? T1: T2: Store (X), 1 (X = 1) Load R1, (Y) Store (Y), 11 (Y = 11) Store (Y’), R1 (Y’= Y) Load R2, (X) Store (X’), R2 (X’= X) additional SC requirementsApril 15, 2010 CS152, Spring 2010 4 Recap: Mutual Exclusion and Locks Want to guarantee only one process is active in a critical section • Blocking atomic read-modify-write instructions e.g., Test&Set, Fetch&Add, Swap vs • Non-blocking atomic read-modify-write instructions e.g., Compare&Swap, Load-reserve/Store-conditional vs • Protocols based on ordinary Loads and StoresApril 15, 2010 CS152, Spring 2010 5 Issues in Implementing Sequential Consistency Implementation of SC is complicated by two issues • Out-of-order execution capability Load(a); Load(b) yes Load(a); Store(b) yes if a ≠ b Store(a); Load(b) yes if a ≠ b Store(a); Store(b) yes if a ≠ b • Caches Caches can prevent the effect of a store from being seen by other processors M P P P P P P SC complications motivate architects to consider weak or relaxed memory modelsApril 15, 2010 CS152, Spring 2010 6 Memory Fences Instructions to sequentialize memory accesses Processors with relaxed or weak memory models (i.e., permit Loads and Stores to different addresses to be reordered) need to provide memory fence instructions to force the serialization of memory accesses Examples of processors with relaxed memory models: Sparc V8 (TSO,PSO): Membar Sparc V9 (RMO): Membar #LoadLoad, Membar #LoadStore Membar #StoreLoad, Membar #StoreStore PowerPC (WO): Sync, EIEIO Memory fences are expensive operations, however, one pays the cost of serialization only when it is requiredApril 15, 2010 CS152, Spring 2010 7 Using Memory Fences Producer posting Item x: Load Rtail, (tail) Store (Rtail), x MembarSS Rtail=Rtail+1 Store (tail), Rtail Consumer: Load Rhead, (head) spin: Load Rtail, (tail) if Rhead==Rtail goto spin MembarLL Load R, (Rhead) Rhead=Rhead+1 Store (head), Rhead process(R) Producer Consumer tail head Rtail Rtail Rhead R ensures that tail ptr is not updated before x has been stored ensures that R is not loaded before x has been storedApril 15, 2010 CS152, Spring 2010 8 Memory Coherence in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has a stale value Do these stale values matter? What is the view of shared memory for programming? cache-1 A 100 CPU-Memory bus CPU-1 CPU-2 cache-2 A 100 memory A 100April 15, 2010 CS152, Spring 2010 9 Write-back Caches & SC • T1 is executed prog T2 LD Y, R1 ST Y’, R1 LD X, R2 ST X’,R2 prog T1 ST X, 1 ST Y,11 cache-2 cache-1 memory X = 0 Y =10 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= • cache-1 writes back Y X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= X = 1 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0 • cache-1 writes back X X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0 • T2 executed X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=11 Y =11 Y’=11 X = 0 X’= 0 • cache-2 writes back X’ & Y’April 15, 2010 CS152, Spring 2010 10 Write-through Caches & SC cache-2 Y = Y’= X = 0 X’= memory X = 0 Y =10 X’= Y’= cache-1 X= 0 Y=10 prog T2 LD Y, R1 ST Y’, R1 LD X, R2 ST X’,R2 prog T1 ST X, 1 ST Y,11 Write-through caches don’t preserve sequential consistency either • T1 executed Y = Y’= X = 0 X’= X = 1 Y =11 X’= Y’= X= 1 Y=11 • T2 executed Y = 11 Y’= 11 X = 0 X’= 0 X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=11April 15, 2010 CS152, Spring 2010 Cache Coherence vs. Memory Consistency • A cache coherence protocol ensures that all writes by one processor are eventually visible to other processors – i.e., updates are not lost • A memory consistency model gives the rules on when a write by one processor can be observed by a read on another – Equivalently, what values can be seen by a load • A cache coherence protocol is not enough to ensure sequential consistency – But if sequentially consistent, then caches must be coherent • Combination of cache coherence protocol plus processor memory reorder buffer implements a given machine’s memory consistency model 11April 15, 2010 CS152, Spring 2010 12 Maintaining Cache Coherence Hardware support is required such that • only one processor at a time has write permission for a location • no processor can load a stale copy of the location after a write ⇒ cache coherence protocolsApril 15, 2010 CS152, Spring 2010 13 Warmup: Parallel I/O (DMA stands for Direct Memory Access, means the I/O device can read/write memory autonomous from the CPU) Either Cache or DMA can be the Bus Master and effect transfers DISK DMA Physical Memory Proc. R/W Data (D) Cache Address (A) A D R/W Page transfers occur while the Processor is running Memory BusApril 15, 2010 CS152, Spring 2010 14 Problems with Parallel I/O Memory Disk: Physical memory may be stale if cache copy is dirty Disk Memory: Cache may hold stale data and not see memory writes DISK DMA Physical Memory Proc. Cache Memory Bus Cached portions of page DMA transfersApril 15, 2010 CS152, Spring 2010 15 Snoopy Cache
View Full Document