Unformatted text preview:

CS162 Computer Architecture Lecture 15: Symmetric Multiprocessor: Cache ProtocolsFigures from Last ClassSymmetric Multiprocessor (SMP)Small-Scale—Shared MemoryPotential HW Cohernecy SolutionsBus Snooping TopologyBasic Snoopy ProtocolsSlide 8An Basic Snoopy ProtocolSnoopy-Cache State Machine-ISnoopy-Cache State Machine-IISnoopy-Cache State Machine-IIIImplementing Snooping CachesImplementation ComplicationsLarger MPsDistributed Directory MPsCS252/PattersonLec 12.12/28/01CS162 Computer ArchitectureLecture 15: Symmetric Multiprocessor: Cache Protocols L.N. BhuyanAdapted from Patterson’s slidesCS252/PattersonLec 12.22/28/01Figures from Last Class•For SMP’s figure and table, see Fig. 9.2 and 9.3, pp. 718, Ch 9, CS 161 text•For distributed shared memory machines, see Fig. 9.8 and 9.9 pp. 727-728.•For message passing machines/clusters, see Fig. 9.12 pp. 735CS252/PattersonLec 12.32/28/01Symmetric Multiprocessor (SMP)•Memory: centralized with uniform access time (“uma”) and bus interconnect•Examples: Sun Enterprise 5000 , SGI Challenge, Intel SystemProCS252/PattersonLec 12.42/28/01Small-Scale—Shared Memory•Caches serve to:–Increase bandwidth versus bus/memory–Reduce latency of access–Valuable for both private data and shared data•What about cache consistency?TimeEvent$ A$ BX(memoory)0 11 CPU Areads X1 12 CPU Breads X1 1 13 CPU Astores 0into X0 1 0CS252/PattersonLec 12.52/28/01Potential HW Cohernecy Solutions•Snooping Solution (Snoopy Bus):–Send all requests for data to all processors–Processors snoop to see if they have a copy and respond accordingly –Requires broadcast, since caching information is at processors–Works well with bus (natural broadcast medium)–Dominates for small scale machines (most of the market)•Directory-Based Schemes (discussed later)–Keep track of what is being shared in 1 centralized place (logically)–Distributed memory => distributed directory for scalability(avoids bottlenecks)–Send point-to-point requests to processors via network–Scales better than Snooping–Actually existed BEFORE Snooping-based schemesCS252/PattersonLec 12.62/28/01Bus Snooping Topology•Cache controller has a hardware snooper that watches transactions over the bus•Examples: Sun Enterprise 5000 , SGI Challenge, Intel System-ProCS252/PattersonLec 12.72/28/01Basic Snoopy Protocols•Write Invalidate Protocol:–Multiple readers, single writer–Write to shared data: an invalidate is sent to all caches which snoop and invalidate any copies–Read Miss: »Write-through: memory is always up-to-date»Write-back: snoop in caches to find most recent copy•Write Broadcast Protocol (typically write through):–Write to shared data: broadcast on bus, processors snoop, and update any copies–Read miss: memory is always up-to-dateCS252/PattersonLec 12.82/28/01Basic Snoopy Protocols•Write Invalidate versus Broadcast:–Invalidate requires one transaction per write-run–Invalidate uses spatial locality: one transaction per block–Broadcast has lower latency between write and read•Write serialization: bus serializes requests!–Bus is single point of arbitrationCS252/PattersonLec 12.92/28/01An Basic Snoopy Protocol•Invalidation protocol, write-back cache•Each block of memory is in one state:–Clean in all caches and up-to-date in memory (Shared)–OR Dirty in exactly one cache (Exclusive)–OR Not in any caches•Each cache block is in one state (track these):–Shared : block can be read–OR Exclusive : cache has only copy, its writeable, and dirty–OR Invalid : block contains no data•Read misses: cause all caches to snoop bus•Writes to clean line are treated as missesCS252/PattersonLec 12.102/28/01Snoopy-Cache State Machine-I •State machinefor CPU requestsfor each cache blockInvalidShared(read/only)Exclusive(read/write)CPU ReadCPU WriteCPU Read hitPlace read misson busPlace Write Miss on busCPU read missWrite back block,Place read misson busCPU WritePlace Write Miss on BusCPU Read missPlace read miss on busCPU Write MissWrite back cache blockPlace write miss on busCPU read hitCPU write hitCache BlockStateCS252/PattersonLec 12.112/28/01Snoopy-Cache State Machine-II•State machinefor bus requests for each cache block•Appendix E? gives details of bus requestsInvalidShared(read/only)Exclusive(read/write)Write BackBlock; (abortmemory access)Write miss for this blockRead miss for this blockWrite miss for this blockWrite BackBlock; (abortmemory access)CS252/PattersonLec 12.122/28/01Place read misson busSnoopy-Cache State Machine-III •State machinefor CPU requestsfor each cache block and for bus requests for each cache blockInvalidShared(read/only)Exclusive(read/write)CPU ReadCPU WriteCPU Read hitPlace Write Miss on busCPU read missWrite back block,Place read misson busCPU WritePlace Write Miss on BusCPU Read missPlace read miss on busCPU Write MissWrite back cache blockPlace write miss on busCPU read hitCPU write hitCache BlockStateWrite miss for this blockWrite BackBlock; (abortmemory access)Write miss for this blockRead miss for this blockWrite BackBlock; (abortmemory access)CS252/PattersonLec 12.132/28/01Implementing Snooping Caches•Multiple processors must be on bus, access to both addresses and data•Add a few new commands to perform coherency, in addition to read and write•Processors continuously snoop on address bus–If address matches tag, either invalidate or update•Since every bus transaction checks cache tags, could interfere with CPU just to check: –solution 1: duplicate set of tags for L1 caches just to allow checks in parallel with CPU–solution 2: L2 cache already duplicate, provided L2 obeys inclusion with L1 cache»block size, associativity of L2 affects L1CS252/PattersonLec 12.142/28/01Implementation Complications•Write Races:–Cannot update cache until bus is obtained»Otherwise, another processor may get bus first, and then write the same cache block!–Two step process:»Arbitrate for bus »Place miss on bus and complete operation–If miss occurs to block while waiting for bus, handle miss (invalidate may be needed) and then restart.–Split transaction bus:»Bus transaction is not atomic: can have multiple outstanding transactions for a block»Multiple misses can interleave, allowing two caches to grab block in the Exclusive state»Must track and prevent multiple misses for one block•Must support interventions and invalidationsCS252/PattersonLec 12.152/28/01Larger MPs•Separate Memory per


View Full Document

UCR CS 162 - Cache Protocols

Download Cache Protocols
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cache Protocols and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cache Protocols 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?