DOC PREVIEW
Berkeley COMPSCI 152 - Lecture Notes

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 21 Directory Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California Berkeley http www eecs berkeley edu krste http inst cs berkeley edu cs152 Recap Snoopy Cache Protocols Memory Bus M1 Snoopy Cache M2 Snoopy Cache M3 Snoopy Cache Physical Memory DMA DISKS Use snoopy mechanism to keep all processors view of memory coherent 4 29 2008 CS152 Spring 08 2 Recap MESI An Enhanced MSI protocol increased performance for private data Each cache line has a tag M Modified Exclusive E Exclusive unmodified S Shared I Invalid Address tag state bits P1 write or read P1 write M ite Other processor reads P1 writes back Read miss shared Read by any processor 4 29 2008 S E t en to wr t P1 in Other processor intent to write P1 read Read miss not shared Write miss Other processor intent to write I CS152 Spring 08 Cache state in processor P1 3 Performance of Symmetric Shared Memory Multiprocessors Cache performance is combination of 1 Uniprocessor cache miss traffic 2 Traffic caused by communication Results in invalidations and subsequent cache misses Adds 4th C coherence miss Joins Compulsory Capacity Conflict Sometimes called a Communication miss 4 29 2008 CS152 Spring 08 4 Coherency Misses 1 True sharing misses arise from the communication of data through the cache coherence mechanism Invalidates due to 1st write to shared block Reads by another CPU of modified block in different cache Miss would still occur if block size were 1 word 2 False sharing misses when a block is invalidated because some word in the block other than the one being read is written into Invalidation does not cause a new value to be communicated but only causes an extra cache miss Block is shared but no word in block is actually shared miss would not occur if block size were 1 word 4 29 2008 CS152 Spring 08 5 Example True v False Sharing v Hit Assume x1 and x2 in same cache block P1 and P2 both read x1 and x2 before Time P1 1 Write x1 2 3 4 29 2008 True False Hit Why True miss invalidate x1 in P2 Read x2 False miss x1 irrelevant to P2 Write x1 4 5 P2 False miss x1 irrelevant to P2 Write x2 False miss x1 irrelevant to P2 Read x2 True miss invalidate x2 in P1 CS152 Spring 08 6 MP Performance 4 Processor Commercial Workload OLTP Decision Support Database Search Engine Uniprocessor cache misses improve with cache size increase 3 25 3 Memory cycles per instruction True sharing and false sharing unchanged going from 1 MB to 8 MB L3 cache Instruction Capacity Conflict Cold False Sharing True Sharing 2 75 2 5 2 25 2 1 75 1 5 1 25 1 0 75 0 5 0 25 Instruction Capacity Conflict Compulsory 0 1 MB 2 MB 4 MB 8 MB Cache size 4 29 2008 7 CS152 Spring 08 MP Performance 2MB Cache Commercial Workload OLTP Decision Support Database Search Engine Memory cycles per instruction True sharing false sharing increase going from 1 to 8 CPUs 3 2 5 2 Instruction Conflict Capacity Cold False Sharing True Sharing 1 5 1 0 5 0 1 4 29 2008 2 CS152 Spring 08 4 Processor count 6 8 8 A Cache Coherent System Must Provide set of states state transition diagram and actions Manage coherence protocol 0 Determine when to invoke coherence protocol a Find info about state of block in other caches to determine action whether need to communicate with other cached copies b Locate the other copies c Communicate with those copies invalidate update 0 is done the same way on all systems state of the line is maintained in the cache protocol is invoked if an access fault occurs on the line Different approaches distinguished by a to c 4 29 2008 CS152 Spring 08 9 Bus based Coherence All of a b c done through broadcast on bus faulting processor sends out a search others respond to the search probe and take necessary action Could do it in scalable network too broadcast to all processors and let them respond Conceptually simple but broadcast doesn t scale with number of processors P on bus bus bandwidth doesn t scale on scalable network every fault leads to at least P network transactions Scalable coherence can have same cache states and state transition diagram different mechanisms to manage protocol 4 29 2008 CS152 Spring 08 10 Scalable Approach Directories Every memory block has associated directory information keeps track of copies of cached blocks and their states on a miss find directory entry look it up and communicate only with the nodes that have copies if necessary in scalable networks communication with directory and copies is through network transactions Many alternatives for organizing directory information 4 29 2008 CS152 Spring 08 11 Basic Operation of Directory P P Cache Cache k processors With each cache block in memory k presence bits 1 dirty bit Interconnection Network Memory presence bits Directory With each cache block in cache 1 valid bit and 1 dirty owner bit dirty bit Read from main memory by processor i If dirty bit OFF then read from main memory turn p i ON if dirty bit ON then recall line from dirty proc cache state to shared update memory turn dirty bit OFF turn p i ON supply recalled data to i Write to main memory by processor i If dirty bit OFF then send invalidations to all caches that have the block turn dirty bit ON supply data to i turn p i ON 4 29 2008 CS152 Spring 08 12 CS152 Administrivia No lecture Thursday May 1 Faculty retreat Last lecture Tuesday May 6 Final quiz Thursday May 8 Informal course feedback Want to hear your opinion of new format What worked and what didn t work especially in labs 4 29 2008 13 CS152 Spring 08 Directory Cache Protocol Handout 6 CPU CPU CPU CPU CPU CPU Cache Cache Cache Cache Cache Cache Interconnection Network Directory Controller Directory Controller Directory Controller Directory Controller DRAM Bank DRAM Bank DRAM Bank DRAM Bank Assumptions Reliable network FIFO message delivery between any given source destination pair 4 29 2008 CS152 Spring 08 14 Cache States For each cache line there are 4 possible states C invalid Nothing The accessed data is not resident in the cache C shared Sh The accessed data is resident in the cache and possibly also cached at other sites The data in memory is valid C modified Ex The accessed data is exclusively resident in this cache and has been modified Memory does not have the most up to date data C transient Pending The accessed data is in a transient state for example the site has just issued a protocol request but has not received the corresponding protocol reply 4 29 2008 CS152


View Full Document

Berkeley COMPSCI 152 - Lecture Notes

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?