DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 20 Snoopy Caches

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

April 15, 2010 CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.cs.berkeley.edu/~cs152April 15, 2010 CS152, Spring 2010 2 Recap: Sequential Consistency A Memory Model “ A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in the order specified by the program” Leslie Lamport Sequential Consistency = arbitrary order-preserving interleaving of memory references of sequential programs M P P P P P PApril 15, 2010 CS152, Spring 2010 3 Recap: Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies ( ) What are these in our example ? T1: T2: Store (X), 1 (X = 1) Load R1, (Y) Store (Y), 11 (Y = 11) Store (Y’), R1 (Y’= Y) Load R2, (X) Store (X’), R2 (X’= X) additional SC requirementsApril 15, 2010 CS152, Spring 2010 4 Recap: Mutual Exclusion and Locks Want to guarantee only one process is active in a critical section • Blocking atomic read-modify-write instructions e.g., Test&Set, Fetch&Add, Swap vs • Non-blocking atomic read-modify-write instructions e.g., Compare&Swap, Load-reserve/Store-conditional vs • Protocols based on ordinary Loads and StoresApril 15, 2010 CS152, Spring 2010 5 Issues in Implementing Sequential Consistency Implementation of SC is complicated by two issues • Out-of-order execution capability Load(a); Load(b) yes Load(a); Store(b) yes if a ≠ b Store(a); Load(b) yes if a ≠ b Store(a); Store(b) yes if a ≠ b • Caches Caches can prevent the effect of a store from being seen by other processors M P P P P P P SC complications motivate architects to consider weak or relaxed memory modelsApril 15, 2010 CS152, Spring 2010 6 Memory Fences Instructions to sequentialize memory accesses Processors with relaxed or weak memory models (i.e., permit Loads and Stores to different addresses to be reordered) need to provide memory fence instructions to force the serialization of memory accesses Examples of processors with relaxed memory models: Sparc V8 (TSO,PSO): Membar Sparc V9 (RMO): Membar #LoadLoad, Membar #LoadStore Membar #StoreLoad, Membar #StoreStore PowerPC (WO): Sync, EIEIO Memory fences are expensive operations, however, one pays the cost of serialization only when it is requiredApril 15, 2010 CS152, Spring 2010 7 Using Memory Fences Producer posting Item x: Load Rtail, (tail) Store (Rtail), x MembarSS Rtail=Rtail+1 Store (tail), Rtail Consumer: Load Rhead, (head) spin: Load Rtail, (tail) if Rhead==Rtail goto spin MembarLL Load R, (Rhead) Rhead=Rhead+1 Store (head), Rhead process(R) Producer Consumer tail head Rtail Rtail Rhead R ensures that tail ptr is not updated before x has been stored ensures that R is not loaded before x has been storedApril 15, 2010 CS152, Spring 2010 8 Memory Coherence in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has a stale value Do these stale values matter? What is the view of shared memory for programming? cache-1 A 100 CPU-Memory bus CPU-1 CPU-2 cache-2 A 100 memory A 100April 15, 2010 CS152, Spring 2010 9 Write-back Caches & SC • T1 is executed prog T2 LD Y, R1 ST Y’, R1 LD X, R2 ST X’,R2 prog T1 ST X, 1 ST Y,11 cache-2 cache-1 memory X = 0 Y =10 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= • cache-1 writes back Y X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = Y’= X = X’= X = 1 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0 • cache-1 writes back X X = 0 Y =11 X’= Y’= X= 1 Y=11 Y = 11 Y’= 11 X = 0 X’= 0 • T2 executed X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=11 Y =11 Y’=11 X = 0 X’= 0 • cache-2 writes back X’ & Y’April 15, 2010 CS152, Spring 2010 10 Write-through Caches & SC cache-2 Y = Y’= X = 0 X’= memory X = 0 Y =10 X’= Y’= cache-1 X= 0 Y=10 prog T2 LD Y, R1 ST Y’, R1 LD X, R2 ST X’,R2 prog T1 ST X, 1 ST Y,11 Write-through caches don’t preserve sequential consistency either • T1 executed Y = Y’= X = 0 X’= X = 1 Y =11 X’= Y’= X= 1 Y=11 • T2 executed Y = 11 Y’= 11 X = 0 X’= 0 X = 1 Y =11 X’= 0 Y’=11 X= 1 Y=11April 15, 2010 CS152, Spring 2010 Cache Coherence vs. Memory Consistency • A cache coherence protocol ensures that all writes by one processor are eventually visible to other processors – i.e., updates are not lost • A memory consistency model gives the rules on when a write by one processor can be observed by a read on another – Equivalently, what values can be seen by a load • A cache coherence protocol is not enough to ensure sequential consistency – But if sequentially consistent, then caches must be coherent • Combination of cache coherence protocol plus processor memory reorder buffer implements a given machine’s memory consistency model 11April 15, 2010 CS152, Spring 2010 12 Maintaining Cache Coherence Hardware support is required such that • only one processor at a time has write permission for a location • no processor can load a stale copy of the location after a write ⇒ cache coherence protocolsApril 15, 2010 CS152, Spring 2010 13 Warmup: Parallel I/O (DMA stands for Direct Memory Access, means the I/O device can read/write memory autonomous from the CPU) Either Cache or DMA can be the Bus Master and effect transfers DISK DMA Physical Memory Proc. R/W Data (D) Cache Address (A) A D R/W Page transfers occur while the Processor is running Memory BusApril 15, 2010 CS152, Spring 2010 14 Problems with Parallel I/O Memory Disk: Physical memory may be stale if cache copy is dirty Disk Memory: Cache may hold stale data and not see memory writes DISK DMA Physical Memory Proc. Cache Memory Bus Cached portions of page DMA transfersApril 15, 2010 CS152, Spring 2010 15 Snoopy Cache


View Full Document

Berkeley COMPSCI 152 - Lecture 20 Snoopy Caches

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 20 Snoopy Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 20 Snoopy Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 20 Snoopy Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?