DOC PREVIEW
U of U CS 7810 - Snooping-Based Coherence

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 251Lecture 2: Snooping-Based Coherence• 3-state and 4-state snooping protocols, update protocol,implementation issues2Multi-Core Cache OrganizationsPCPCPCPCPCPCPCPCCCC CCCCCC CCCPrivate L1 cachesShared L2 cacheBus between L1s and single L2 cache controllerSnooping-based coherence between L1s3Multi-Core Cache OrganizationsPrivate L1 cachesShared L2 cache, but physically distributedScalable networkDirectory-based coherence between L1sPCPCPCPCPCPCPCPC4Multi-Core Cache OrganizationsPrivate L1 cachesShared L2 cache, but physically distributedBus connecting the four L1s and four L2 banksSnooping-based coherence between L1sPCPCPCPC5Multi-Core Cache OrganizationsPrivate L1 cachesPrivate L2 cachesScalable networkDirectory-based coherence between L2s(through a separate directory)PCPCPCPCPCPCPCPCD6Cache CoherenceA multiprocessor system is cache coherent if• a value written by a processor is eventually visible toreads by other processors – write propagation• two writes to the same location by two processors areseen in the same order by all processors – write serialization7Cache Coherence Protocols• Directory-based: A single location (directory) keeps trackof the sharing status of a block of memory• Snooping: Every cache block is accompanied by the sharingstatus of that block – all cache controllers monitor theshared bus so they can update the sharing status of theblock, if necessary¾ Write-invalidate: a processor gains exclusive access ofa block before writing by invalidating all other copies¾ Write-update: when a processor writes, it updates othershared copies of that block8Protocol-I MSI• 3-state write-back invalidation bus-based snooping protocol• Each block can be in one of three states – invalid, shared,modified (exclusive)• A processor must acquire the block in exclusive state inorder to write to it – this is done by placing an exclusiveread request on the bus – every other cached copy isinvalidated• When some other processor tries to read an exclusiveblock, the block is demoted to shared9Design Issues, Optimizations• When does memory get updated?¾ demotion from modified to shared?¾ move from modified in one cache to modified in another?• Who responds with data? – memory or a cache that hasthe block in exclusive state – does it help if sharers respond?• We can assume that bus, memory, and cache statetransactions are atomic – if not, we will need more states• A transition from shared to modified only requires an upgraderequest and no transfer of data• Is the protocol simpler for a write-through cache?104-State Protocol• Multiprocessors execute many single-threaded programs• A read followed by a write will generate bus transactionsto acquire the block in exclusive state even though thereare no sharers• Note that we can optimize protocols by adding morestates – increases design/verification complexity11MESI Protocol• The new state is exclusive-clean – the cache can serviceread requests and no other cache has the same block• When the processor attempts a write, the block isupgraded to exclusive-modified without generating a bustransaction• When a processor makes a read request, it must detectif it has the only cached copy – the interconnect mustinclude an additional signal that is asserted by eachcache if it has a valid copy of the block12Design Issues• When caches evict blocks, they do not inform othercaches – it is possible to have a block in shared stateeven though it is an exclusive-clean copy• Cache-to-cache sharing: SRAM vs. DRAM latencies,contention in remote caches, protocol complexities(memory has to wait, which cache responds), can beespecially useful in distributed memory systems• The protocol can be improved by adding a fifthstate (owner – MOESI) – the owner services reads(instead of memory)13Update Protocol (Dragon)• 4-state write-back update protocol, first used in theDragon multiprocessor (1984)• Write-back update is not the same as write-through –on a write, only caches are updated, not memory• Goal: writes may usually not be on the critical path, butsubsequent reads may be144 States• No invalid state • Modified and Exclusive-clean as before: used when thereis a sole cached copy• Shared-clean: potentially multiple caches have this blockand main memory may or may not be up-to-date• Shared-modified: potentially multiple caches have thisblock, main memory is not up-to-date, and this cachemust update memory – only one block can be in Sm state• In reality, one state would have sufficed – more statesto reduce traffic15Design Issues• If the update is also sent to main memory, the Smstate can be eliminated• If all caches are informed when a block is evicted, theblock can be moved from shared to M or E – this canhelp save future bus transactions• Having an extra wire to determine exclusivity seemslike a worthy trade-off in update systems16State TransitionsToFromNP I E S MNP 0 0 1.25 0.96 1.68I 0.64 0 0 1.87 0.002E 0.20 0 14.0 0.02 1.00S 0.42 2.5 0 134.7 2.24M 2.63 0.002 0 2.3 843.6ToFromNP I E S MNP -- -- BusRd BusRd BusRdXI -- -- BusRd BusRd BusRdXE -- -- -- -- --S -- -- Not possible -- BusUpgrM BusWB BusWB Not possible BusWB --State transitionsper 1000 datamemory referencesfor OceanBus actionsfor each statetransitionNP – Not Present17Snooping – Basic Implementation• Assume single level of cache, atomic bus transactions• It is simpler to implement a processor-side cachecontroller that monitors requests from the processor anda bus-side cache controller that services the bus• Both controllers are constantly trying to read tags¾ tags can be duplicated (moderate area overhead)¾ unlike data, tags are rarely updated¾ tag updates stall the other controller18Reporting Snoop Results• In a multiprocessor, memory has to wait for the snoopresult before it chooses to respond – need 3 wired-ORsignals: (i) indicates that a cache has a copy, (ii) indicatesthat a cache has a modified copy, (iii) indicates that thesnoop has not completed• Ensuring timely snoops: the time to respond could befixed or variable


View Full Document

U of U CS 7810 - Snooping-Based Coherence

Download Snooping-Based Coherence
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Snooping-Based Coherence and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Snooping-Based Coherence 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?