DOC PREVIEW
Berkeley COMPSCI 152 - Directory-Based Cache Protocols

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

April 20, 2010 CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 21: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.cs.berkeley.edu/~cs152April 20, 2010 CS152, Spring 2010 2 Recap: Snoopy Cache Protocols Use snoopy mechanism to keep all processors’ view of memory coherent M1 M2 M3 Snoopy Cache DMA Physical Memory Memory Bus Snoopy Cache Snoopy Cache DISKSApril 20, 2010 CS152, Spring 2010 3 MESI: An Enhanced MSI protocol increased performance for private data M E S I M: Modified Exclusive E: Exclusive but unmodified S: Shared I: Invalid Each cache line has a tag Address tag state bits Write miss Other processor intent to write Read miss, shared Other processor intent to write P1 write Read by any processor Other processor reads P1 writes back P1 read P1 write or read Cache state in processor P1 P1 intent to write Read miss, not shared Other processor reads Other processor intent to write, P1 writes backApril 20, 2010 CS152, Spring 2010 4 Performance of Symmetric Shared-Memory Multiprocessors Cache performance is combination of: 1. Uniprocessor cache miss traffic 2. Traffic caused by communication – Results in invalidations and subsequent cache misses • Adds 4th C: coherence miss – Joins Compulsory, Capacity, Conflict – (Sometimes called a Communication miss)April 20, 2010 CS152, Spring 2010 5 Coherency Misses 1. True sharing misses arise from the communication of data through the cache coherence mechanism • Invalidates due to 1st write to shared block • Reads by another CPU of modified block in different cache • Miss would still occur if block size were 1 word 2. False sharing misses when a block is invalidated because some word in the block, other than the one being read, is written into • Invalidation does not cause a new value to be communicated, but only causes an extra cache miss • Block is shared, but no word in block is actually shared ⇒ miss would not occur if block size were 1 wordApril 20, 2010 CS152, Spring 2010 6 Example: True v. False Sharing v. Hit? Time P1 P2 True, False, Hit? Why? 1 Write x1 2 Read x2 3 Write x1 4 Write x2 5 Read x2 • Assume x1 and x2 in same cache block. P1 and P2 both read x1 and x2 before. True miss; invalidate x1 in P2 False miss; x1 irrelevant to P2 False miss; x1 irrelevant to P2 False miss; x1 irrelevant to P2 True miss; invalidate x2 in P1April 20, 2010 CS152, Spring 2010 7 MP Performance 4 Processor Commercial Workload: OLTP, Decision Support (Database), Search Engine 00.250.50.7511.251.51.7522.252.52.7533.251 MB 2 MB 4 MB 8 MBCache sizeMemory cycles per instructionInstructionCapacity/ConflictColdFalse SharingTrue Sharing• True sharing and false sharing unchanged going from 1 MB to 8 MB (L3 cache) • Uniprocessor cache misses improve with cache size increase (Instruction, Capacity/Conflict, Compulsory)April 20, 2010 CS152, Spring 2010 8 MP Performance 2MB Cache Commercial Workload: OLTP, Decision Support (Database), Search Engine • True sharing, false sharing increase going from 1 to 8 CPUs 00.511.522.531 2 4 6 8Processor countMemory cycles per instructionInstructionConflict/CapacityColdFalse SharingTrue SharingApril 20, 2010 CS152, Spring 2010 9 A Cache Coherent System Must: • Provide set of states, state transition diagram, and actions • Manage coherence protocol – (0) Determine when to invoke coherence protocol – (a) Find info about state of address in other caches to determine action » whether need to communicate with other cached copies – (b) Locate the other copies – (c) Communicate with those copies (invalidate/update) • (0) is done the same way on all systems – state of the line is maintained in the cache – protocol is invoked if an “access fault” occurs on the line • Different approaches distinguished by (a) to (c)April 20, 2010 CS152, Spring 2010 10 Bus-based Coherence • All of (a), (b), (c) done through broadcast on bus – faulting processor sends out a “search” – others respond to the search probe and take necessary action • Could do it in scalable network too – broadcast to all processors, and let them respond • Conceptually simple, but broadcast doesn’t scale with number of processors, P – on bus, bus bandwidth doesn’t scale – on scalable network, every fault leads to at least P network transactions • Scalable coherence: – can have same cache states and state transition diagram – different mechanisms to manage protocolApril 20, 2010 CS152, Spring 2010 11 Scalable Approach: Directories • Every memory block has associated directory information – keeps track of copies of cached blocks and their states – on a miss, find directory entry, look it up, and communicate only with the nodes that have copies if necessary – in scalable networks, communication with directory and copies is through network transactions • Many alternatives for organizing directory informationApril 20, 2010 CS152, Spring 2010 12 Basic Operation of Directory • k processors. • With each cache-block in memory: k presence-bits, 1 dirty-bit • With each cache-block in cache: 1 valid bit, and 1 dirty (owner) bit •••PPCacheCacheMemory Directorypresence bits dirty bitInterconnection Network• Read from main memory by processor i: • If dirty-bit OFF then { read from main memory; turn p[i] ON; } • if dirty-bit ON then { recall line from dirty proc (downgrade cache state to shared); update memory; turn dirty-bit OFF; turn p[i] ON; supply recalled data to i;} • Write to main memory by processor i: • If dirty-bit OFF then {send invalidations to all caches that have the block; turn dirty-bit ON; supply data to i; turn p[i] ON; ... }April 20, 2010 CS152, Spring 2010 13 CS152 Administrivia • Final quiz, Thursday April 29 – Multiprocessors, Memory models, Cache coherence – Lectures 19-21, PS 5, Lab 5 • Next lecture, “Virtual Machines”, Thursday April 22 • Last lecture, “Putting it all Together”, Tuesday April 27 – Summary of the course – Case Study: Intel Nehalem – HKN Course SurveyApril 20, 2010 CS152, Spring 2010 14 Directory Cache Protocol (Handout 6) • Assumptions:


View Full Document

Berkeley COMPSCI 152 - Directory-Based Cache Protocols

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Directory-Based Cache Protocols
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Directory-Based Cache Protocols and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Directory-Based Cache Protocols 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?