Rutgers University CS 417 - Distributed Shared Memory - D602458

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 417> Distributed Shared Memory

DOC PREVIEW

Rutgers University CS 417 - Distributed Shared Memory

School name Rutgers University- The State University of New Jersey

Course Cs 417- Distributed Systems

Pages 33

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Distributed Systems 28. Distributed Shared Memory Paul Krzyzanowski [email protected] 12/16/2011 1 © 2011 Paul KrzyzanowskiMotivation • SMP systems – Run parts of a program in parallel – Share single address space – Use threads for parallelism – Use synchronization primitives to prevent race conditions • Can we achieve this with multicomputers? – All communication and synchronization must be done with messages 12/16/2011 2 © 2011 Paul KrzyzanowskiDistributed Shared Memory (DSM) Goal: Allow networked computers to share a region of virtual memory How do we do this? 12/16/2011 3 © 2011 Paul KrzyzanowskiTake advantage of the MMU • Page table entry for a page is valid if the page is held (cached) locally • Attempt to access non-local page leads to a page fault • Page fault handler – Invokes DSM protocol to handle fault – Fault handler brings page from remote node • Operations are transparent to programmer – DSM looks like any other virtual memory system 12/16/2011 4 © 2011 Paul KrzyzanowskiSimplest design • Each page of virtual address space exists on only one machine at a time – no caching 12/16/2011 5 © 2011 Paul KrzyzanowskiSimplest design • On page fault: – Consult central server to find which machine is currently holding the page: Directory • Request the page from the current owner: – Current owner invalidates PTE – Sends page contents – Recipient allocates frame, reads page, sets PTE – Informs directory of new location 12/16/2011 6 © 2011 Paul KrzyzanowskiProblem • Directory becomes a bottleneck – All page query requests must go to this server • Solution – Distributed directory – Distribute among all processors – Each node responsible for portion of address space – Find responsible system: • node[ page_num mod num_processors ] 12/16/2011 7 © 2011 Paul KrzyzanowskiP0 Distributed Directory Page Location 0000 P3 0004 P1 0008 P1 000C P2 … … P2 Page Location 0002 P3 0006 P1 000A P0 000E -- … … P3 Page Location 0003 P3 0007 P1 000B P2 000F -- … … P1 Page Location 0001 P3 0005 P1 0009 P0 000D P2 … … 12/16/2011 8 © 2011 Paul KrzyzanowskiDesign Considerations: granularity Memory blocks are typically a multiple of a node’s page size • Large pages are good – Cost of migration amortized over many localized accesses • BUT – Increased chances that multiple objects reside in one page • Thrashing • False sharing 12/16/2011 9 © 2011 Paul KrzyzanowskiDesign Considerations: replication What if we allow copies of shared pages on multiple nodes? • Replication (caching) reduces average cost of read operations – Simultaneous reads can be executed locally across hosts • Write operations become more expensive – Cached copies need to be invalidated or updated • Worthwhile if reads/writes ratio is high 12/16/2011 10 © 2011 Paul KrzyzanowskiReplication Multiple readers, single writer – One host can be granted a read-write copy or – multiple hosts granted read-only copies 12/16/2011 11 © 2011 Paul KrzyzanowskiReplication Read operation: – If page not local • Acquire read-only copy of the page • Set access writes to read-only on any writeable copy on other nodes Write operation: – If page not local or no write permission • Revoke write permission from other writable copy (if exists) • Get copy of page from owner (if needed) • Invalidate all copies of the page at other nodes 12/16/2011 12 © 2011 Paul KrzyzanowskiFull replication Extend model: – Multiple hosts have read/write access – Need multiple-readers, multiple-writers protocol – Access to shared data must be controlled to maintain consistency 12/16/2011 13 © 2011 Paul KrzyzanowskiDealing with replication • Keep track of copies of the page – Directory with single node per page not enough – Keep track of copyset • Set of all systems that requested copies • On getting a request for a copy of a page: – Directory adds requestor to copyset – Page owner sends page contents to requestor • On getting a request to invalidate page: – Directory issues invalidation messages to all nodes in copyset and wait for acknowledgements 12/16/2011 14 © 2011 Paul KrzyzanowskiHow do you propagate changes? • Send entire page – Easiest, but may be a lot of data • Send differences – Local system must save original and compute differences 12/16/2011 15 © 2011 Paul KrzyzanowskiHome-based algorithms • Home-based – A node (usually first writer) is chosen to be the home of the page – On write, a non-home node will send changes to the home node. • Other cached copies invalidated – On read, a non-home node will get changes (or page) from home node • Non-home-based – Node will always contact the directory to find the current owner (latest copy) and obtain page from there 12/16/2011 16 © 2011 Paul KrzyzanowskiConsistency Model Definition of when modifications to data may be seen at a given processor Defines how memory will appear to a programmer Places restrictions on what values can be returned by a read of a memory location 12/16/2011 17 © 2011 Paul KrzyzanowskiConsistency Model • Must be well-understood – Determines how a programmer reasons about the correctness of a program – Determines what hardware and compiler optimizations may take place 12/16/2011 18 © 2011 Paul KrzyzanowskiSequential Semantics • Provided by most (uniprocessor) programming languages/systems • Program order The result of any execution is the same as if the operations of all processors were executed in some sequential order and the operations of each individual processor appear in this sequence in the order specified by the program. ― Leslie Lamport 12/16/2011 19 © 2011 Paul KrzyzanowskiSequential Semantics • Requirements: – All memory operations must execute one at a time – All operations of a single processor appear to execute in program order – Interleaving among processors is OK 12/16/2011 20 © 2011 Paul KrzyzanowskiSequential semantics P0 P1 P2 P3 P4 memory 12/16/2011 21 © 2011 Paul KrzyzanowskiAchieving sequential semantics • Illusion is efficiently supported in uniprocessor systems – Execute operations in program order when they are to the same location or when one controls the execution of another – Otherwise, compiler or hardware can reorder • Compiler: – Register allocation, code motion, loop

View Full Document