DOC PREVIEW
RIT EECC 756 - Scalable Cache Coherent Systems

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EECC756 - ShaabanEECC756 - Shaaban#1 lec # 14 Spring2002 5-9-2002Scalable Cache Coherent SystemsScalable Cache Coherent Systems• Scalable distributed shared memory machines Assumptions:– Processor-Cache-Memory nodes connected by scalable network.– Distributed shared physical address space.– Communication assist must interpret network transactions, formingshared address space.• For a system with shared physical address space:– A cache miss must be satisfied transparently from local or remotememory depending on address.– By its normal operation, cache replicates data locally resulting ina potential cache coherence problem between local and remote copiesof data.– A coherency solution must be in place for correct operation.• Standard snoopy protocols studied earlier do not apply for lack ofa bus or a broadcast medium to snoop on.• For this type of system to be scalable, in addition to latency andbandwidth scalability, the cache coherence protocol or solution usedmust also scale as well.EECC756 - ShaabanEECC756 - Shaaban#2 lec # 14 Spring2002 5-9-2002Functionality Expected In A Cache Coherent SystemFunctionality Expected In A Cache Coherent System• Provide a set of states, a state transition diagram, andactions representing the cache coherence protocol used.• Manage coherence protocol:(0) Determine when to invoke the coherence protocol(a) Find source of information about state of cache line in other caches• Whether need to communicate with other cached copies(b) Find out the location or locations of other copies if any.(c) Communicate with those copies (invalidate/update).• (0) is done the same way on all cache coherent systems:– State of the local cache line is maintained in the cache.– Protocol is invoked if an “access fault” occurs on the line.• Different approaches distinguished by (a) to ( c ).EECC756 - ShaabanEECC756 - Shaaban#3 lec # 14 Spring2002 5-9-2002Bus-Based CoherenceBus-Based Coherence• All of (a), (b), (c) done through broadcast on the bus:– Faulting processor sends out a “search”.– Others respond to the search probe and take necessaryaction.• This approach could be done in a scalable network too:– Broadcast to all processors, and let them respond.– Conceptually simple, but broadcast doesn’t scale with p:• Bus bandwidth doesn’t scale.• On a scalable network (e.g MINs) , every fault may lead toat least p network transactions.EECC756 - ShaabanEECC756 - Shaaban#4 lec # 14 Spring2002 5-9-2002Scalable Cache CoherenceScalable Cache Coherence• A scalable cache coherence approach may havesimilar cache line states and state transitiondiagrams as in bus-based coherence protocols.• However, different additional mechanisms otherthan broadcasting must be devised to manage thecoherence protocol.• Two possible approaches:– Approach #1: Hierarchical Snooping.– Approach #2: Directory-based cache coherence.– Approach #3: A combination of the above twoapproaches.EECC756 - ShaabanEECC756 - Shaaban#5 lec # 14 Spring2002 5-9-2002Approach #1: Hierarchical SnoopingApproach #1: Hierarchical Snooping• Extend snooping approach: A hierarchy of broadcast media:– Tree of buses or rings (KSR-1).– Processors are in the bus- or ring-based multiprocessors at theleaves.– Parents and children connected by two-way snoopy interfaces:• Snoop both buses and propagate relevant transactions.– Main memory may be centralized at root or distributed amongleaves.• Issues (a) - (c) handled similarly to bus, but not full broadcast.– Faulting processor sends out “search” bus transaction on its bus.– Propagates up and down hierarchy based on snoop results.• Problems:– High latency: multiple levels, and snoop/lookup at every level.– Bandwidth bottleneck at root.• This approach has, for the most part, been abandoned.EECC756 - ShaabanEECC756 - Shaaban#6 lec # 14 Spring2002 5-9-2002Hierarchical Snoopy Cache CoherenceHierarchical Snoopy Cache CoherenceSimplest way: hierarchy of buses; snoopy coherence at each level.– or rings.• Consider buses. Two possibilities:(a) All main memory at the global (B2) bus.(b) Main memory distributed among the clusters.(a)(b)P PL1L1L2B1P PL1L1L2B1B2Main Me mory (Mp)P PL2L1L1B1MemoryP PL1L1B1L2Me moryB2EECC756 - ShaabanEECC756 - Shaaban#7 lec # 14 Spring2002 5-9-2002Bus Hierarchies with Centralized MemoryBus Hierarchies with Centralized MemoryB1 follows standard snoopy protocol.Need a monitor per B1 bus:– Decides what transactions to pass back and forth between buses.– Acts as a filter to reduce bandwidth needs.Use L2 cache:• Much larger than L1 caches (set associative). Must maintaininclusion.• Has dirty-but-stale bit per line.• L2 cache can be DRAM based, since fewer references get to it.P PL1L1L2B1P PL1L1L2B1B2Main Me mory (Mp)EECC756 - ShaabanEECC756 - Shaaban#8 lec # 14 Spring2002 5-9-2002Bus Hierarchies with Centralized MemoryBus Hierarchies with Centralized MemoryAdvantages and DisadvantagesAdvantages and Disadvantages• Advantages:– Simple extension of bus-based scheme.– Misses to main memory require single traversal to root of hierarchy.– Placement of shared data is not an issue.• Disadvantages:– Misses to local data (e.g., stack) also traverse hierarchy.– Higher traffic and latency.– Memory at global bus must be highly interleaved for bandwidth.EECC756 - ShaabanEECC756 - Shaaban#9 lec # 14 Spring2002 5-9-2002Bus Hierarchies with Distributed MemoryBus Hierarchies with Distributed Memory• Main memory distributed among clusters.• Cluster is a full-fledged bus-based machine, memory and all.• Automatic scaling of memory (each cluster brings some with it).• Good placement can reduce global bus traffic and latency.• But latency to far-away memory is larger.P PL2L1L1B1MemoryP PL1L1B1L2MemoryB2EECC756 - ShaabanEECC756 - Shaaban#10 lec # 14 Spring2002 5-9-2002• A directory is composed of a number of directory entries.• Every memory block has an associated directory entry:– Keeps track of the nodes or processors that have cachedcopies of the memory block and their states.– On a miss, find directory entry, look it up, andcommunicate only with the nodes that have copies ifnecessary.– In scalable networks, communication with directory andnodes that have copies is through network transactions.• Many alternatives exist for organizing directoryinformation.Scalable


View Full Document

RIT EECC 756 - Scalable Cache Coherent Systems

Documents in this Course
Load more
Download Scalable Cache Coherent Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Scalable Cache Coherent Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Scalable Cache Coherent Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?