Yale CPSC 424 - Shared Memory Architectures

Unformatted text preview:

1Shared Memory Architectures Arvind KrishnamurthyFall 2004Approaches to Building Parallel MachinesP1Switch/BusMain memoryPn(Interleaved)(Interleaved)First-level $P1$$PnShared CacheBus Based shared memoryScale Alliant FX-8 early 80’s eight 68020s with x-bar to 512 KB interleaved cache Encore & Sequent first 32-bit micros (N32032) two to a board with a shared cacheP1$Interconnection network$PnMemMemCentralized MemoryDance Hall, UMAMain memory(Interleaved)2Shared Cache Architectures What are the advantages and disadvantages?Example Cache Coherence Problem Processors see different values for u after event 3 With write back caches, value written back to memory depends on happenstance of which cache flushes or writes back value when Processes accessing main memory may see very stale value Unacceptable to programs, and frequent!I/O devicesMemoryP1$ $$P2P35u = ?4u = ?u:52u:53u= 713Snoopy Cache-Coherence Protocols Bus is a broadcast medium & caches know what they have Cache Controller “snoops” all transactions on the shared bus A transaction is a relevant transaction if it involves a cache block currently contained in this cache take action to ensure coherence invalidate, update, or supply value depends on state of the block and the protocolStateAddressDataI/O devicesMemP1$Bus snoop$PnCache-memorytransactionDesign Choices Update state of blocks in response to processor and snoop events  Valid/invalid Dirty (inconsistent with memory) Shared (in another caches) Snoopy protocol set of states state-transition diagram actions Basic Choices Write-through vs Write-back Invalidate vs. UpdateSnoopState Tag Data°°°Cache ControllerProcessorLd/St4Write-through Invalidate Protocol Two states per block in each cache Invalid, Valid as in uniprocessor Cache check: Compute position(s) to check based on address Check whether valid If valid, check whether tag matches required address If present: If read Î just use the value If write Î update value, send update to bus Writes invalidate all other caches can have multiple simultaneous readers of block, but write invalidates themIVBusWr / -PrRd/ --PrWr / BusWrPrWr / BusWrPrRd / BusRdWrite-through vs. Write-back Write-through protocol is simple every write is observable Every write goes on the bus=> Only one write can take place at a time in any processor Uses a lot of bandwidth!Example: 3GHz processor, CPI = 1, 10% stores of 8 bytes=> 300 M stores per second per processor=> 2400 MB/s per processor4 GB/s bus can support only about 1-2 processors without saturating5Invalidate vs. Update When does one prefer: Invalidation based scheme? Update based scheme? => Need to look at program reference patterns and hardware complexity, but first: correctnessWrite-back Caches (Uniprocessor) 3 Processor operations:  read write replace 3 states: Invalid, valid(clean), modified(dirty) 2 bus transactions: Read, write-backPrRd/—PrRd/—PrW/BusRdBusRd/—PrW/—VMIReplace/BusWBPrWPrRd/BusRdReplace/-6Write-back MSI (multi-processors) Treat valid as “shared” Treat modified as “exclusive” Introduce new bus operation Read-exclusive: read for latermodifications (read to own) BusRdx causes others to invalidate BusRdx even if write-hit in S Read obtains block in “shared”PrRd/—PrRd/—PrW/BusRdXBusRd/—PrW/—SMIBusRdX/FlushBusRdX/—BusRd/FlushPrW/BusRdXPrRd/BusRdLower Level Protocol Choices How does memory know whether or not to supply data on BusRd? BusRd observed in M state: transition to make? M Æ I M Æ S Depends on expectation of access patterns BusRdX could be replaced by BusUpgr without data transfer Read-Write is 2 bus transactions, even if no sharing BusRd (IÆS) followed by BusRdX or BusUpgr What happens on sequential programs? Performance degrades7Update Protocols If data is to be communicated between processors, invalidate protocols seem inefficient Consider shared flag: P0 waits for it to be zero, then does work and sets it one P1 waits for it to be one, then does work and sets it zero How many transactions? P0: Read shared P1: Read Exclusive P1: Write 0 P0: Read P1: Read shared P0: Read Exclusive P0: Write 1 P1: Read…Shared Memory Systems Two variants: Shared cache systems Separate cache, bus-based access to shared memory Variants: Write-through vs. write-back systems Invalidation-based vs. update-based systems8Write-Back Update Protocol Let’s have a system where: Write-backs happen when cache line is replaced All writes result in updates of other caches caching the value Let’s design the simplest write-back update protocol: How many states should it have? What are the significance of the states?Dragon Write-back Update Protocol 4 states Exclusive-clean (E): Myproc & Memory have it Shared clean (Sc): Myproc and other procs may have it Shared modified (Sm): Myproc and other procs may have it, memory does not have updated value (Myproc’s responsibility to update memory) Modified(M): Myproc has it, no one else Cache block can be: M state on one cache and no one has the same cache block E state on one cache and no one has the same cache block Sc on one or more caches Sm on one cache, Sc on zero or more caches No invalid state If in cache, cannot be invalid (but still need to deal with tag mismatches) New bus transaction: BusUpd Broadcasts single word written on bus, updates other relevant caches Bandwidth savings9Questions: How can we recognize which state should be currently associated with a cache line? How do we know that a cache line should be stored in: Exclusive state? Modified state? Shared clean state? Shared modified state?Dragon State Transition DiagramEScSmMPrW/—PrRd/—PrRd/—PrRd/—PrRdMiss/BusRd(S)PrRdMiss/BusRd(!S)PrW/—PrWMiss/(BusRd(S); BusUpd)PrWMiss/BusRd(!S)PrW/BusUpd(S)PrW/BusUpd(S)BusRd/—BusRd/FlushPrRd/—BusUpd/UpdateBusUpd/UpdateBusRd/FlushPrW/BusUpd(!S)PrWr/BusUpd(!S)10Lower-level Protocol Choices Can shared-modified state be eliminated? If memory is updated on BusUpd transactions (DEC Firefly) Dragon protocol doesn’t (assumes DRAM memory slow to update) Should replacement of an Sc block be broadcast? Would allow last copy to go to E state


View Full Document

Yale CPSC 424 - Shared Memory Architectures

Download Shared Memory Architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Shared Memory Architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Shared Memory Architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?