CMU CS 15740 - lect12.4up - D2616481

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15740> lect12.4up

DOC PREVIEW

CMU CS 15740 - lect12.4up

School name Carnegie Mellon University

Course Cs 15740- Computer Architecture

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Cache Coherence Part 1 The Cache Coherence Problem CS 740 October 27 2003 P2 P1 u P3 u 4 u 7 5 3 u 5 u 5 Topics The Cache Coherence Problem Snoopy Protocols 1 I O devices u 5 2 Memory Slides from Todd Mowry Culler Singh 2 A Coherent Memory System Intuition Reading a location should return latest value written by any process Easy in uniprocessors Except for I O coherence between I O devices and processors But infrequent so software solutions work uncacheable operations flush pages pass I O data through caches Would like same to hold when processes run on different processors E g as if the processes were interleaved on a uniprocessor The coherence problem is more pervasive and performance critical in multiprocessors has a much larger impact on hardware design 3 CS 740 F 03 CS 740 F 03 Problems with the Intuition Recall Value returned by read should be last value written But last is not well defined Even in sequential case last is defined in terms of program order not time Order of operations in the machine language presented to processor Subsequent defined in analogous way and well defined In parallel case program order defined within a process but need to make sense of orders across processes Must define a meaningful semantics the answer involves both cache coherence and an appropriate memory consistency model to be discussed in a later lecture 4 CS 740 F 03 Cache Coherence Solutions Formal Definition of Coherence Results of a program values returned by its read operations A memory system is coherent if the results of any execution of a program are such that for each location it is possible to construct a hypothetical serial order of all operations to the location that is consistent with the results of the execution and in which 1 operations issued by any particular process occur in the order issued by that process and 2 the value returned by a read is the value written by the last write to that location in the serial order Two necessary features Write propagation value written must become visible to others Write serialization writes to location seen in same order by all if I see w1 after w2 you should not see w2 before w1 no need for analogous read serialization since reads not visible to others 5 CS 740 F 03 Shared Caches E g DPC in the Encore Alliant FX 8 Problems are limited cache bandwidth and cache interference Benefits are fine grain sharing and prefetch effects P P P P Crossbar Shd Cache 2 4 way interleaved cache Memory Hardware Based two most common variations snoopy schemes rely on broadcast to observe all coherence traffic well suited for buses and small scale systems example SGI Challenge directory schemes uses centralized information to avoid broadcast scales well to large numbers of processors example SGI Origin 2000 6 CS 740 F 03 Basic Idea all coherence related activity is broadcast to all processors e g on a global bus each processor or its representative monitors aka snoops these actions and reacts to any which are relevant to the current contents of its cache examples if another processor wishes to write to a line you may need to invalidate I e discard the copy in your own cache if another processor wishes to read a line for which you have a dirty copy you may need to supply Most common approach in commercial multiprocessors Examples SGI Challenge SUN Enterprise multiprocessor PCs etc Memory 7 often used in clusters of workstations or PCs e g Treadmarks extend virtual memory system to perform more work on page faults send messages to remote machines if necessary Snoopy Cache Coherence Schemes Processors share a single cache essentially punting the problem Useful for very small machines P Software Based CS 740 F 03 8 CS 740 F 03 Implementing a Snoopy Protocol Cache controller now receives inputs from both sides Requests from processor bus requests responses from snooper In either case takes zero or more actions Updates state responds with data generates new bus transactions Coherence with Write through Caches Bus snoop Protocol is a distributed algorithm cooperating state machines Like that of allocation in cache and transfer to from cache 9 CS 740 F 03 Write through State Transition Diagram 10 Write through State Transition Diagram PrRd BusWr PrRd BusRd I PrWr BusWr PrWr BusWr V BusWr PrRd BusRd Cache memory transaction CS 740 F 03 V Obs gen I Processor initiated transactions PrWr BusWr Bus snooper initiated transactions Two states per block in each cache as in uniprocessor state of a block can be seen as p vector Hardware state bits associated with only blocks that are in the cache other blocks can be seen as being in invalid not present state in that cache Write will invalidate all other caches no local change of state can have multiple simultaneous readers of block but write invalidates them 11 I O devices Key extensions to uniprocessor snooping invalidating updating caches no new states or bus transactions in this case invalidation versus update based protocols Write propagation even in inval case later reads will see new value inval causes miss on later access and memory up to date via write through PrWr BusWr PrRd PrRD PrWr BusWr BusRd Mem Set of states state transition diagram actions Granularity of coherence is typically a cache block Pn P1 CS 740 F 03 Processor initiated transactions Bus snooper initiated transactions Two states per block in each cache as in uniprocessor state of a block can be seen as p vector Hardware state bits associated with only blocks that are in the cache other blocks can be seen as being in invalid not present state in that cache Write will invalidate all other caches no local change of state can have multiple simultaneous readers of block but write invalidates them 12 CS 740 F 03 Is WT Coherent Problem with Write Through Assume High bandwidth requirements A bus transaction completes before next one starts Atomic bus transactions Atomic memory transactions Write propagation Write Serialization Write back caches absorb most writes as cache hits Write hits don t go on bus But now how do we ensure write propagation and serialization Need more sophisticated protocols large design space Key is the bus writes serialized by bus 13 Every write from every processor goes to shared bus and memory Consider a 500MHz 1CPI processor where 15 of instructions are 8 byte stores Each processor generates 75M stores or 600MB data per second 1GB s bus can support only 1 processor without saturating Write through especially unpopular for SMPs CS

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 13 pages.

CMU CS 15740 - lect12.4up

Sign up for free to view:

Please select your school