DOC PREVIEW
U of U CS 7810 - Directory Protocol Implementations

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 271Lecture 3: Directory Protocol Implementations• Topics: coherence vs. msg-passing, corner cases in directory protocols2Future Scalable Designs• Intel’s Single Cloud Computer (SCC): an example prototype• No support for hardware cache coherence• Programmer can write shared-memory apps by marking pages as uncacheable or L1-cacheable, but forcing memory flushes to propagate results• Primarily intended for message-passing apps• Each core runs a version of Linux• Barrelfish-like OSes will likely soon be mainstream3Scalable Cache Coherence• Will future many-core chips forego hardware cache coherence in favor of message-passing or sw-managed cache coherence?• It’s the classic programmer-effort vs. hw-effort trade-off … traditionally, hardware has won (e.g. ILP extraction)• Two questions worth answering: will motivated programmers prefer message-passing?, is scalable hw cache coherence do-able?4Message Passing• Message passing can be faster and more energy-efficient• Only required data is communicated: good for energy and reduces network contention• Data can be sent before it is required (push semantics; cache coherence is pull semantics and frequently requires indirection to get data)• Downsides: more software stack layers and more memory hierarchy layers must be traversed, and.. more programming effort5Scalable Directory Coherence• Note that the protocol itself need not be changed• If an application randomly accesses data with zero locality: long latencies for data communication also true for message-passing apps• If there is locality and page coloring is employed, the directory and data-sharers will often be in close proximity• Does hardware overhead increase? See examples in last class… the overhead is ~2-10% and sharing can be tracked at coarse granularity… hierarchy can also be employed, with snooping-based coherence among a group of nodes6SGI Origin 2000• Flat memory-based directory protocol• Uses a bit vector directory representation• Two processors per node – combining multiple processors in a node reduces costPL2CAM/DPL2Interconnect7Directory Structure• The system supports either a 16-bit or 64-bit directory (fixed cost); for small systems, the directory works as a full bit vector representation• Seven states, of which 3 are stable• For larger systems, a coarse vector is employed – each bit represents p/64 nodes• State is maintained for each node, not each processor – the communication assist broadcasts requests to both processors8Handling Reads• When the home receives a read request, it looks up memory (speculative read) and directory in parallel• Actions taken for each directory state: shared or unowned: memory copy is clean, data is returned to requestor, state is changed to excl if there are no other sharers busy: a NACK is sent to the requestor exclusive: home is not the owner, request is fwded to owner, owner sends data to requestor and home9Inner Details of Handling the Read• The block is in exclusive state – memory may or may not have a clean copy – it is speculatively read anyway• The directory state is set to busy-exclusive and the presence vector is updated• In addition to fwding the request to the owner, the memory copy is speculatively forwarded to the requestor Case 1: excl-dirty: owner sends block to requestor and home, the speculatively sent data is over-written Case 2: excl-clean: owner sends an ack (without data) to requestor and home, requestor waits for this ack before it moves on with speculatively sent data10Inner Details II• Why did we send the block speculatively to the requestor if it does not save traffic or latency? the R10K cache controller is programmed to not respond with data if it has a block in excl-clean state when an excl-clean block is replaced from the cache, the directory need not be updated – hence, directory cannot rely on the owner to provide data and speculatively provides data on its own11Handling Write Requests• The home node must invalidate all sharers and all invalidations must be acked (to the requestor), the requestor is informed of the number of invalidates to expect• Actions taken for each state: shared: invalidates are sent, state is changed to excl, data and num-sharers are sent to requestor, the requestor cannot continue until it receives all acks (Note: the directory does not maintain busy state, subsequent requests will be fwded to new owner and they must be buffered until the previous write has completed)12Handling Writes II• Actions taken for each state: unowned: if the request was an upgrade and not a read-exclusive, is there a problem? exclusive: is there a problem if the request was an upgrade? In case of a read-exclusive: directory is set to busy, speculative reply is sent to requestor, invalidate is sent to owner, owner sends data to requestor (if dirty), and a “transfer of ownership” message (no data) to home to change out of busy busy: the request is NACKed and the requestor must try again13Handling Write-Back• When a dirty block is replaced, a writeback is generated and the home sends back an ack • Can the directory state be shared when a writeback is received by the directory?• Actions taken for each directory state: exclusive: change directory state to unowned and send an ack busy: a request and the writeback have crossed paths: the writeback changes directory state to shared or excl (depending on the busy state), memory is updated, and home sends data to requestor, the intervention request is dropped14Writeback CasesP1 P2D3E: P1WbackThis is the “normal” caseD3 sends back an AckAck15Writeback CasesP1 P2D3E: P1 busyWbackIf someone else has the block in exclusive, D3 moves to busyIf Wback is received, D3 serves the requesterIf we didn’t use busy state when transitioning from E:P1 to E:P2, D3 may not have known who to service (since ownership may have been passed on to P3 and P4…) (although, this problem can be solved by NACKing the Wback and having P1 buffer its “strange” intervention requests)FwdRd or Wr16Writeback CasesP1 P2D3E: P1 busyTransferownershipIf


View Full Document

U of U CS 7810 - Directory Protocol Implementations

Download Directory Protocol Implementations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Directory Protocol Implementations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Directory Protocol Implementations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?