DOC PREVIEW
UMD CMSC 714 - The Quadrics Network

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1The Quadrics NetworkHigh Performance Clustering TechnologyFabrizio PetriniWu-chun FengAdolfy HoisieSalvador CollEitan FrachtenburgLos Alamos National LaboratoryWhat is the Quadrics Network?It’s a distributed virtual memory systemFeaturing:• Single global virtual address space• Fault tolerancePresentation Outline• Quadrics network anatomy• Implementation of global virtual memory• Libraries• Experimental resultsVirtual Memory• Every process has virtual address space• On each memory access:– Processor reads process’s page table– Page table converts virtual address to physical address– Memory access performed with physical addressGlobal Virtual Memory• Virtual memory space spans all processors• Process can access page located on remotememory transparentlyThe Quadrics Network• “Elan” network interface - PCI card in each node• “Elite” communication switches• Multiple communication libraries– allow custom protocols– library trade-off: performance vs. ease-of-use2Elan Network InterfaceElan’s microcode processorHandles memory requests.• 4 threads:– inputter– DMA engine– processor scheduling– command processing• 2 stage pipeline / thread => 8 outstanding memory requestsElan componentsThread processor:• Implements messaging libraries• 32 bit RISC + extra specialized instructionsMMU:• Converts 32 bit virtual address -->– 28 bit SDRAM physical address, or – 48 bit PCI address• 16 entry TLBElan componentsRouting table• Virtual process # --> tags to determine route64MB SDRAM8k Cache for SDRAMLink logic• 2 virtual channels• 128 entry FIFO bufferQuaternary Fat-Tree Network• 4-ary n-tree (n=3 above) - each switch connects to 4 switches• comprised of “Elite” crossbar switchesElite switch• 8 bi-directional links: 2 virtual channels in each direction• 400 MB/s bandwidth• 35 ns latency• CRC error detection between links• 2 priority levels3Routing• Elan router puts tag sequence in header• Elite switch removes first tag, routes to next switch• At data link level:– Elite partitions packet into “flits”– flits sent independently– after last flit in packet, receiver sends ACKElan virtual memory• MMU converts virtual-->physical– can translate between architectures• Physical data can be on Elan SDRAM or on local memory• Location of physical data not normally visible to usersVirtual address translationVirtual Memory ExtensionExtension to virtual memory: virtual operation– Cooperating processes can transfer data between address spaces– Protection still maintainedContext• Virtual process id replaced with context– context + virtual address identify page• Multiple processes (on multiple machines) can have same context– allows for distributed shared memoryFault tolerance• Fault tolerance steps:– Packet consists of route info + transactions– Last transaction contains ACK Now flag– Packet not successful until receiver sends ACK– Link reused only after ACK received• After fixed # of errors, new route negotiated4Programming Libraries• Allows programmer to write intelligent protocols• Elan3lib– Low-level, high efficiency– Allows user to program Elan, move data manually between Elan memory & local memory (w/o operating system knowing)• Elanlib– Higher-level, lower efficiency– Allows MPI-like message passingExperimental MethodologySetup:• 16 dual processor 733 MHz Pentium III’s–1 GB RAM– 64 bit, 66 MHz PCI slot for Elan card• Quaternary 2-dimensional fat tree network• Linux 2.4.0-test7 operating systemBenchmarks:• Elan3lib benchmark to show best performance• Elanlib benchmark to simulate MPI-2Ping test - BandwidthBandwidth lies between 307 MB/s for MPI to 335 MB/s for Elan3libPing test - LatencyLatency lies between 5.0 us for MPI to 2.4 us for Elan3lib up to 64 bytesScalability - Hot spot vulnerabilityVirtually no bandwidth decrease when 8 processors access same addressAuthors’ conclusions• Analysis demonstrates that “the network and its libraries deliver excellent performance to users”• Future work:– analyzing scalability with larger numbers of nodes– testing actual scientific applications– testing more elaborate communication


View Full Document

UMD CMSC 714 - The Quadrics Network

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download The Quadrics Network
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Quadrics Network and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Quadrics Network 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?