DOC PREVIEW
CMU CS 15740 - lect12.4up

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Cache Coherence Part 1 The Cache Coherence Problem CS 740 October 27 2003 P2 P1 u P3 u 4 u 7 5 3 u 5 u 5 Topics The Cache Coherence Problem Snoopy Protocols 1 I O devices u 5 2 Memory Slides from Todd Mowry Culler Singh 2 A Coherent Memory System Intuition Reading a location should return latest value written by any process Easy in uniprocessors Except for I O coherence between I O devices and processors But infrequent so software solutions work uncacheable operations flush pages pass I O data through caches Would like same to hold when processes run on different processors E g as if the processes were interleaved on a uniprocessor The coherence problem is more pervasive and performance critical in multiprocessors has a much larger impact on hardware design 3 CS 740 F 03 CS 740 F 03 Problems with the Intuition Recall Value returned by read should be last value written But last is not well defined Even in sequential case last is defined in terms of program order not time Order of operations in the machine language presented to processor Subsequent defined in analogous way and well defined In parallel case program order defined within a process but need to make sense of orders across processes Must define a meaningful semantics the answer involves both cache coherence and an appropriate memory consistency model to be discussed in a later lecture 4 CS 740 F 03 Cache Coherence Solutions Formal Definition of Coherence Results of a program values returned by its read operations A memory system is coherent if the results of any execution of a program are such that for each location it is possible to construct a hypothetical serial order of all operations to the location that is consistent with the results of the execution and in which 1 operations issued by any particular process occur in the order issued by that process and 2 the value returned by a read is the value written by the last write to that location in the serial order Two necessary features Write propagation value written must become visible to others Write serialization writes to location seen in same order by all if I see w1 after w2 you should not see w2 before w1 no need for analogous read serialization since reads not visible to others 5 CS 740 F 03 Shared Caches E g DPC in the Encore Alliant FX 8 Problems are limited cache bandwidth and cache interference Benefits are fine grain sharing and prefetch effects P P P P Crossbar Shd Cache 2 4 way interleaved cache Memory Hardware Based two most common variations snoopy schemes rely on broadcast to observe all coherence traffic well suited for buses and small scale systems example SGI Challenge directory schemes uses centralized information to avoid broadcast scales well to large numbers of processors example SGI Origin 2000 6 CS 740 F 03 Basic Idea all coherence related activity is broadcast to all processors e g on a global bus each processor or its representative monitors aka snoops these actions and reacts to any which are relevant to the current contents of its cache examples if another processor wishes to write to a line you may need to invalidate I e discard the copy in your own cache if another processor wishes to read a line for which you have a dirty copy you may need to supply Most common approach in commercial multiprocessors Examples SGI Challenge SUN Enterprise multiprocessor PCs etc Memory 7 often used in clusters of workstations or PCs e g Treadmarks extend virtual memory system to perform more work on page faults send messages to remote machines if necessary Snoopy Cache Coherence Schemes Processors share a single cache essentially punting the problem Useful for very small machines P Software Based CS 740 F 03 8 CS 740 F 03 Implementing a Snoopy Protocol Cache controller now receives inputs from both sides Requests from processor bus requests responses from snooper In either case takes zero or more actions Updates state responds with data generates new bus transactions Coherence with Write through Caches Bus snoop Protocol is a distributed algorithm cooperating state machines Like that of allocation in cache and transfer to from cache 9 CS 740 F 03 Write through State Transition Diagram 10 Write through State Transition Diagram PrRd BusWr PrRd BusRd I PrWr BusWr PrWr BusWr V BusWr PrRd BusRd Cache memory transaction CS 740 F 03 V Obs gen I Processor initiated transactions PrWr BusWr Bus snooper initiated transactions Two states per block in each cache as in uniprocessor state of a block can be seen as p vector Hardware state bits associated with only blocks that are in the cache other blocks can be seen as being in invalid not present state in that cache Write will invalidate all other caches no local change of state can have multiple simultaneous readers of block but write invalidates them 11 I O devices Key extensions to uniprocessor snooping invalidating updating caches no new states or bus transactions in this case invalidation versus update based protocols Write propagation even in inval case later reads will see new value inval causes miss on later access and memory up to date via write through PrWr BusWr PrRd PrRD PrWr BusWr BusRd Mem Set of states state transition diagram actions Granularity of coherence is typically a cache block Pn P1 CS 740 F 03 Processor initiated transactions Bus snooper initiated transactions Two states per block in each cache as in uniprocessor state of a block can be seen as p vector Hardware state bits associated with only blocks that are in the cache other blocks can be seen as being in invalid not present state in that cache Write will invalidate all other caches no local change of state can have multiple simultaneous readers of block but write invalidates them 12 CS 740 F 03 Is WT Coherent Problem with Write Through Assume High bandwidth requirements A bus transaction completes before next one starts Atomic bus transactions Atomic memory transactions Write propagation Write Serialization Write back caches absorb most writes as cache hits Write hits don t go on bus But now how do we ensure write propagation and serialization Need more sophisticated protocols large design space Key is the bus writes serialized by bus 13 Every write from every processor goes to shared bus and memory Consider a 500MHz 1CPI processor where 15 of instructions are 8 byte stores Each processor generates 75M stores or 600MB data per second 1GB s bus can support only 1 processor without saturating Write through especially unpopular for SMPs CS


View Full Document

CMU CS 15740 - lect12.4up

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view lect12.4up and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lect12.4up and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?