DOC PREVIEW
Berkeley COMPSCI 152 - Snoopy Caches

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architectureand Engineering Lecture 20: Snoopy CachesKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.cs.berkeley.edu/~cs1524/21/20092CS152-Spring!09Recap: Sequential ConsistencyA Memory Model“ A system is sequentially consistent if the result ofany execution is the same as if the operations of allthe processors were executed in some sequential order, and the operations of each individual processorappear in the order specified by the program” Leslie LamportSequential Consistency = arbitrary order-preserving interleavingof memory references of sequential programsMP P P P P P4/21/20093CS152-Spring!09Recap: Sequential ConsistencySequential consistency imposes more memory orderingconstraints than those imposed by uniprocessorprogram dependencies ( ) What are these in our example ?T1: T2:Store (X), 1 (X = 1) Load R1, (Y) Store (Y), 11 (Y = 11) Store (Y’), R1 (Y’= Y) Load R2, (X) Store (X’), R2 (X’= X)additional SC requirements4/21/20094CS152-Spring!09Recap: Mutual Exclusion and LocksWant to guarantee only one process is active in a criticalsection• Blocking atomic read-modify-write instructionse.g., Test&Set, Fetch&Add, Swapvs• Non-blocking atomic read-modify-write instructionse.g., Compare&Swap, Load-reserve/Store-conditionalvs• Protocols based on ordinary Loads and Stores4/21/20095CS152-Spring!09Issues in ImplementingSequential ConsistencyImplementation of SC is complicated by two issues• Out-of-order execution capabilityLoad(a); Load(b) yesLoad(a); Store(b) yes if a ! bStore(a); Load(b) yes if a ! bStore(a); Store(b) yes if a ! b• CachesCaches can prevent the effect of a store from being seen by other processorsMP P P P P PSC complications motivates architects to consider weak orrelaxed memory models4/21/20096CS152-Spring!09Memory FencesInstructions to sequentialize memory accessesProcessors with relaxed or weak memory models (i.e.,permit Loads and Stores to different addresses to be reordered) need to provide memory fence instructions to force the serialization of memory accessesExamples of processors with relaxed memory models:Sparc V8 (TSO,PSO): Membar Sparc V9 (RMO): Membar #LoadLoad, Membar #LoadStoreMembar #StoreLoad, Membar #StoreStorePowerPC (WO): Sync, EIEIOMemory fences are expensive operations, however, one pays the cost of serialization only when it is required4/21/20097CS152-Spring!09Using Memory FencesProducer posting Item x:Load Rtail, (tail)Store (Rtail), xMembarSSRtail=Rtail+1Store (tail), RtailConsumer:Load Rhead, (head)spin: Load Rtail, (tail)if Rhead==Rtail goto spinMembarLLLoad R, (Rhead)Rhead=Rhead+1Store (head), Rheadprocess(R)ProducerConsumertail head RtailRtailRheadRensures that tail ptris not updated before x has been storedensures that R isnot loaded before x has been stored4/21/20098CS152-Spring!09Memory Consistency in SMPsSuppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has a stale value Do these stale values matter?What is the view of shared memory for programming?cache-1A 100CPU-Memory busCPU-1CPU-2cache-2A 100memoryA 1004/21/20099CS152-Spring!09Write-back Caches & SC• T1 is executed prog T2LD Y, R1ST Y’, R1LD X, R2ST X’,R2 prog T1 ST X, 1 ST Y,11cache-2cache-1 memory X = 0 Y =10 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= • cache-1 writes back Y X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= X = 1 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0• cache-1 writes back X X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0• T2 executed X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=11 Y =11 Y’=11 X = 0 X’= 0 • cache-2 writes back X’ & Y’inconsiste nt4/21/200910CS152-Spring!09Write-through Caches & SCcache-2 Y = Y’= X = 0 X’= memory X = 0 Y =10 X’= Y’=cache-1 X= 0 Y=10prog T2LD Y, R1ST Y’, R1LD X, R2ST X’,R2 prog T1 ST X, 1 ST Y,11Write-through caches don’t preservesequential consistency either• T1 executed Y = Y’= X = 0 X’= X = 1 Y =11 X’= Y’= X= 1 Y=11• T2 executed Y = 11 Y’= 11 X = 0 X’= 0 X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=114/21/200911CS152-Spring!09Maintaining Sequential ConsistencySC is sufficient for correct producer-consumerand mutual exclusion code (e.g., Dekker)Multiple copies of a location in various cachescan cause SC to break down.Hardware support is required such that• only one processor at a time has write permission for a location• no processor can load a stale copy of the location after a write" cache coherence protocols4/21/200912CS152-Spring!09Cache Coherence Protocols for SCwrite request:the address is invalidated (updated) in all othercaches before (after) the write is performedread request:if a dirty copy is found in some cache, a write-back is performed before the memory is readWe will focus on Invalidation protocols as opposed to Update protocols4/21/200913CS152-Spring!09Warmup: Parallel I/O (DMA stands for Direct Memory Access)Either Cache or DMA canbe the Bus Master andeffect transfers DISK DMAPhysicalMemoryProc. R/W Data (D)CacheAddress (A)ADR/W Page transfersoccur while theProcessor is runningMemory Bus4/21/200914CS152-Spring!09Problems with Parallel I/OMemory Disk: Physical memory may be stale if Cache copy is dirtyDisk Memory: Cache may hold state data and notsee memory writes DISK DMAPhysicalMemoryProc.CacheMemory BusCached portions of page DMA transfers4/21/200915CS152-Spring!09Snoopy Cache Goodman 1983• Idea: Have cache watch (or snoop upon) DMAtransfers, and then “do the right thing”• Snoopy cache tags are dual-ported Proc. CacheSnoopy read portattached to MemoryBus Data(lines)Tags and StateADR/W Used to drive Memory Buswhen Cache is Bus MasterAR/W 4/21/200916CS152-Spring!09Snoopy Cache Actions for DMAObserved Bus Cycle Cache State Cache Action Address not cachedDMA Read Cached, unmodifiedMemory Disk Cached, modified Address not cachedDMA Write Cached, unmodifiedDisk Memory Cached, modified4/21/200917CS152-Spring!09CS152 Administrivia• Quiz 5, Thursday April 23– Covers VLIW, Vector, Multithreaded4/21/200918CS152-Spring!09Shared Memory Multiprocessor Use snoopy mechanism to keep all processors’view of memory


View Full Document

Berkeley COMPSCI 152 - Snoopy Caches

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Snoopy Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Snoopy Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Snoopy Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?