DOC PREVIEW
Berkeley COMPSCI 258 - Split-C for the New Millennium

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Split-C for the New MillenniumIntroductionVI ArchitectureActive MessagesAM-VIA ComponentsSlide 6Slide 7AM-VIA IntegrationAM-VIA OperationsSlide 10Slide 11Slide 12Slide 13Design TradeoffsReflectionsSplit-CImplementing Split-CSplit-C over AMVIASlide 19Slide 20Split-C over Reliable VIASplit-C over Unreliable VIASlide 23Slide 24Slide 25Slide 26Slide 27AM(v2) ArchitectureSlide 29Slide 30Slide 31Slide 32Split-C for the New MillenniumAndrew Begel, Phil Buonadonna, David Gay{abegel,philipb,dgay}@cs.berkeley.eduIntroduction•Berkeley’s new Millennium cluster–16 2-way Intel 400 Mhz PII SMPs–Myrinet NICs•Virtual Interface Architecture (VIA) user-level network•Active Messages•Split-CProject GoalsImplement Active Messages over VIAImplement and measure Split-C over VIAVI ArchitectureVIRecv QSend QDescriptorDescriptorDescriptorDescriptorDescriptorDescriptorNetwork Interface ControllerStatusStatusReceive DoorbellSend DoorbellVirtual Address SpaceRM RM RMVI ConsumerActive Messages•Paradigm for message-based communication–Concept: Overlap communication/computation•Implementation–Two-phase request/reply pairs–Endpoints: Processes Connection to a Virtual Network–Bundles: Collection of process endpoints•Operations–AM_Map(), AM_Request(), AM_Reply(), AM_Poll()–Credit based flow-control schemeAM-VIA Components•VI Queue (VIQ)–Logical channel for AM message type–VI & independent Send/Receive Queues–Independent request credit scheme (counter n)VIDxs(2*k)Dxs(2*k +1)Data(2*k)Data(2*k +1)SendRecvn < kAM-VIA Components•VI Queue (VIQ)–Logical channel for AM message type–VI & independent Send/Receive Queues–Independent request credit scheme (counter n)•MAP Object–Container for 3 VIQ’s•Short,Medium,LongMAP ObjectAM-VIA Components•VI Queue (VIQ)–Logical channel for AM message type–VI & independent Send/Receive Queues–Independent request credit scheme (counter n)•MAP Object–Container for 3 VIQ’s•Short,Medium,Long–Single Registered Memory RegionMAP Object•Bundle: Pair of VI Completion Queues–Send/Receive AM-VIA IntegrationProc AProc BProc C•Endpoints: Collection of MAP objects–Virtual network emulated by point-to-point connectionsAM-VIA Operations•Map –Allocates VI and registered memory resources and establishes connections.•Send operations–Copies data into a free send buffer posts descriptor.•Receive operations–Short/Long messages: copies data and invokes handler–Medium: invokes handler w/ pointer to data buffer•Polling–Request/Reply marshalling •Empties completion queue into Request/Reply FIFO queues•Process single Request and/or Reply on each iteration–Recycles send descriptorsOne-Way Message Timing0501001502002503001 10 100 1000 10000Message Size (bytes)Time (usec)AMVIA2AMVIAStreaming Performance0501001502002503003504004501 10 100 1000 10000Message Size (bytes)Bandwidth (Mbits/sec)AM2VIA2AMVIAAMVIA LogP uBenchmarks0.0010.0020.0030.0040.0050.0060.000 200 400 600 800 1000Burst Size (Msgs)Time (usec)Δ=0Δ=5Δ=10Δ=15Δ=20Δ=25Δ=30Δ=35Δ=40Δ=45Δ=50`AM LogP uBenchmarks05101520250 200 400 600 800 1000Burst Size (Msgs)Time (usec)D=0D=5D=10D=15Design Tradeoffs•Logical Channels for Short/Medium/Long messages–Balances resources (VI’s, buffering) and reliability–Fine grained credit scheme –Requires advanced knowledge of reply size.–Requires request-reply marshalling upon receipt•Data Copying–Simplest/Robust means to buffer management–Zero copy on medium receives requires k+1 buffering. •Completion Queue/Bundle –Straightforward implementation of bundle–May overflow on high communication volume–Prevents endpoint migrationReflections•AMVIA Implementation–Robust. Works for wide variety of AM applications–Performance suffers due to subtle architectural differences•VI Architecture shortcomings–Lack of support for mapping a VI to a user context–VI Naming complicates IPC on the same host•Active Message shortcomings–Memory Ownership semantics prevent true zero-copy for medium messages•Both benefit from some direct hardware support–VIA: Hardware doorbell management–AM: Distinction of request/reply messagesSplit-C•C-based shared address space, parallel language•Distributed memory, explicit global pointers•Split-phase global read/writes:l := r r :- lr := lsync() store_sync()process addressProcess 0Process 110xdeadbeef (__) (oo) /-------\/ / | ||* ||----|| ~~ ~~Implementing Split-C•Split-C implemented as a modified gcc compiler•Split-phase reads, writes translated to library calls Just need to implement a library•Essential library calls:get char syncput int + bulk store_syncstore ...•Four implementations:–Split-C over AMVIA–Split-C over reliable VIA–Split-C over unreliable VIA–Split-C over shared memory + AMVIAxSplit-C over AMVIA•Establish connection between every pair of processes•Simple requests/replies to implement get, put, store, e.g.:p0: get(loc, <0x1, 0xbeef>) request "get"(1, loc, 0xbeef) p1p0 continues program executionAM connectionProcess 0Process 2Process 1 (__) (oo) /-------\/ / | ||* ||----|| ~~ ~~Split-C over AMVIA•Establish connection between every pair of processes•Simple requests/replies to implement get, put, store, e.g.:p0: get(loc, <0x1, 0xbeef>) request "get"(1, loc, 0xbeef) p1p0 continues program executionp1: receive request "get"(…) reply "getr"(loc, a-cow) p0AM connectionProcess 0Process 2Process 1 (__) (oo) /-------\/ / | ||* ||----|| ~~ ~~ (__) (oo) /-------\/ / | ||* ||----|| ~~ ~~Split-C over AMVIA•Establish connection between every pair of processes•Simple requests/replies to implement get, put, store, e.g.:p0: get(loc, <0x1, 0xbeef>) request "get"(1, loc, 0xbeef) p1p0 continues program executionp1: receive request "get"(…) reply "getr"(loc, a-cow) p0p0: receive reply "getr"(…) store cow at locAM connectionProcess 0Process 2Process 1 (__) (oo) /-------\/ / | ||* ||----|| ~~ ~~ (__) (oo) /-------\/ / | ||* ||----|| ~~ ~~Split-C over Reliable VIA•Goal: Reduce send and receive overhead for Split-C operations•Method 1: Specialise AMVIA for Split-C library–support only short, medium messages–remove all dynamic dispatch (AM calls, handler dispatch)–reduce message size•Method 2: Allow reply-free requests (for stores)–reply to every nth store request, rather than


View Full Document

Berkeley COMPSCI 258 - Split-C for the New Millennium

Documents in this Course
Load more
Download Split-C for the New Millennium
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Split-C for the New Millennium and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Split-C for the New Millennium 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?