DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 20: Snoopy Caches

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 20: Snoopy CachesRecap: Sequential Consistency A Memory ModelRecap: Sequential ConsistencyRecap: Mutual Exclusion and LocksIssues in Implementing Sequential ConsistencyMemory Fences Instructions to sequentialize memory accessesUsing Memory FencesMemory Consistency in SMPsWrite-back Caches & SCWrite-through Caches & SCMaintaining Sequential ConsistencyCache Coherence Protocols for SCWarmup: Parallel I/OProblems with Parallel I/OSnoopy Cache Goodman 1983Snoopy Cache Actions for DMACS152 AdministriviaShared Memory MultiprocessorCache State Transition Diagram The MSI protocolTwo Processor Example (Reading and writing the same cache line)ObservationMESI: An Enhanced MSI protocol increased performance for private dataOptimized Snoop with Level-2 CachesInterventionFalse SharingSynchronization and Caches: Performance IssuesPerformance Related to Bus OccupancyLoad-reserve & Store-conditionalPerformance: Load-reserve & Store-conditionalOut-of-Order Loads/Stores & CCAcknowledgementsCS 152 Computer Architectureand Engineering Lecture 20: Snoopy CachesKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.cs.berkeley.edu/~cs1524/22/20082CS152-Spring’08Recap: Sequential ConsistencyA Memory Model“ A system is sequentially consistent if the result ofany execution is the same as if the operations of allthe processors were executed in some sequential order, and the operations of each individual processorappear in the order specified by the program” Leslie LamportSequential Consistency = arbitrary order-preserving interleavingof memory references of sequential programsMP P P P P P4/22/20083CS152-Spring’08Recap: Sequential ConsistencySequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies ( ) What are these in our example ?T1: T2:Store (X), 1 (X = 1) Load R1, (Y) Store (Y), 11 (Y = 11) Store (Y’), R1 (Y’= Y) Load R2, (X) Store (X’), R2 (X’= X)additional SC requirements4/22/20084CS152-Spring’08Recap: Mutual Exclusion and LocksWant to guarantee only one process is active in a critical section• Blocking atomic read-modify-write instructionse.g., Test&Set, Fetch&Add, Swap vs• Non-blocking atomic read-modify-write instructionse.g., Compare&Swap, Load-reserve/Store-conditionalvs• Protocols based on ordinary Loads and Stores4/22/20085CS152-Spring’08Issues in Implementing Sequential ConsistencyImplementation of SC is complicated by two issues• Out-of-order execution capabilityLoad(a); Load(b) yesLoad(a); Store(b) yes if a  bStore(a); Load(b) yes if a  bStore(a); Store(b) yes if a  b• CachesCaches can prevent the effect of a store from being seen by other processorsMP P P P P PSC complications motivates architects to consider weak or relaxed memory models4/22/20086CS152-Spring’08Memory FencesInstructions to sequentialize memory accessesProcessors with relaxed or weak memory models (i.e.,permit Loads and Stores to different addresses to be reordered) need to provide memory fence instructions to force the serialization of memory accessesExamples of processors with relaxed memory models:Sparc V8 (TSO,PSO): Membar Sparc V9 (RMO): Membar #LoadLoad, Membar #LoadStoreMembar #StoreLoad, Membar #StoreStorePowerPC (WO): Sync, EIEIOMemory fences are expensive operations, however, one pays the cost of serialization only when it is required4/22/20087CS152-Spring’08Using Memory FencesProducer posting Item x:Load Rtail, (tail)Store (Rtail), xMembarSSRtail=Rtail+1Store (tail), RtailConsumer:Load Rhead, (head)spin: Load Rtail, (tail)if Rhead==Rtail goto spinMembarLLLoad R, (Rhead)Rhead=Rhead+1Store (head), Rheadprocess(R)ProducerConsumertail head RtailRtailRheadRensures that tail ptris not updated before x has been storedensures that R isnot loaded before x has been stored4/22/20088CS152-Spring’08Memory Consistency in SMPsSuppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has a stale value Do these stale values matter?What is the view of shared memory for programming?cache-1A 100CPU-Memory busCPU-1CPU-2cache-2A 100memoryA 1004/22/20089CS152-Spring’08Write-back Caches & SC• T1 is executed prog T2LD Y, R1ST Y’, R1LD X, R2ST X’,R2 prog T1 ST X, 1 ST Y,11cache-2cache-1 memory X = 0 Y =10 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= • cache-1 writes back Y X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= X = 1 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0• cache-1 writes back X X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0• T2 executed X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=11 Y =11 Y’=11 X = 0 X’= 0 • cache-2 writes back X’ & Y’inconsistent4/22/200810CS152-Spring’08Write-through Caches & SCcache-2 Y = Y’= X = 0 X’= memory X = 0 Y =10 X’= Y’=cache-1 X= 0 Y=10prog T2LD Y, R1ST Y’, R1LD X, R2ST X’,R2 prog T1 ST X, 1 ST Y,11Write-through caches don’t preserve sequential consistency either• T1 executed Y = Y’= X = 0 X’= X = 1 Y =11 X’= Y’= X= 1 Y=11• T2 executed Y = 11 Y’= 11 X = 0 X’= 0 X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=114/22/200811CS152-Spring’08Maintaining Sequential ConsistencySC is sufficient for correct producer-consumerand mutual exclusion code (e.g., Dekker)Multiple copies of a location in various cachescan cause SC to break down.Hardware support is required such that• only one processor at a time has write permission for a location• no processor can load a stale copy of the location after a write cache coherence protocols4/22/200812CS152-Spring’08Cache Coherence Protocols for SCwrite request: the address is invalidated (updated) in all othercaches before (after) the write is performedread request: if a dirty copy is found in some cache, a write-back is performed before the memory is read We will focus on Invalidation protocols as opposed to Update protocols4/22/200813CS152-Spring’08Warmup: Parallel I/O (DMA stands for Direct Memory Access)Either Cache or DMA canbe the Bus Master andeffect transfers DISK DMAPhysicalMemoryProc. R/W Data (D)CacheAddress (A)ADR/W Page transfersoccur while theProcessor is runningMemory Bus4/22/200814CS152-Spring’08Problems with Parallel I/OMemory Disk: Physical


View Full Document

Berkeley COMPSCI 152 - Lecture 20: Snoopy Caches

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 20: Snoopy Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 20: Snoopy Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 20: Snoopy Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?