DOC PREVIEW
UMD CMSC 714 - Shared Memory Architecture

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Shared Memory ArchitectureMichael SchatzCMSC 714October 4, 20071Thursday, October 4, 2007Taxonomy of Computer Architectures MIMDMultiprocessorsSingle Address spaceShared MemoryMulticomputersMultiple Address spacesUMACentral MemoryNUMAdistributed memory NORMAno-remote memory accessPVP (SGI/Cray T90)SMP (Intel SHV, SUN E10000, DEC 8400SGI Power Challenge, IBM R60, etc.)COMA (KSR-1, DDM)CC-NUMA(SGI Origin2000, Origin3000, Cray T3E, HP Exemplar, Sequent NUMA-Q, Data General)NCC-NUMA (Cray T3D, IBM SP3)Cluster (IBM SP2, DEC TruCluster,Microsoft Wolfpack, “Beowolf”, etc.)loosely coupled, multiple OS“MPP” (Intel TFLOPS,TM-5)tightly coupled & single OSMIMD Multiple Instruction s Multiple Data PVP Parallel Vector ProcessorUMA Uniform Memory Access SMP Symmetric Multi-ProcessorNUMA Non-Uniform Memory Access COMA Cache Only Memory ArchitectureNORMA No-Remote Memory Access CC-NUMA Cache-Coherent NUMAMPP Massively Parallel Processor NCC-NUMA Non-Cache Coherent NUMA (c) SGI 20012Thursday, October 4, 2007ccNUMA•Global address space, but non-uniform memory access times•Any processor can address memory from any node•Local data much faster to access•Fetching from remote memory is slow, need cache•Have to maintain consistent state: cache coherency1. Broadcast (Snoopy) Coherency: •Broadcast all addresses, monitor for relevant updates•Simple & fast implementation, but high bandwidth usage2. Directory Coherency: •Requests for memory tracked by directory•Complicated & higher latency, but better scalability3Thursday, October 4, 2007The SGI Origin: A ccNUMA highly scalable serverJ. Laudon and D. LenoskiProceedings of the 1997 International Symposium on Computer Architecture,May 19974Thursday, October 4, 2007Background•SGI Power Challenge systems (~1993)•36x R10000 Processor•cache-coherent, global address space•Design Goals for Origin (~1996)1. Scale beyond 36 processors2. Retain cache-coherent, global address space3. Low entry and incremental cost5Thursday, October 4, 2007Origin Overview Abstract The SGI Origin 200/2000 is a cache-coherent non-uni-form memory access (ccNUMA) multiprocessor designedand manufactured by Silicon Graphics, Inc. The Originsystem was designed from the ground up as a multiproces-sor capable of scaling to both small and large processorcounts without any cost, bandwidth, or latency cliffs. TheOrigin system consists of up to 512 nodes interconnectedby a highly scalable Craylink network. Each node consistsof one or two R10000 processors and up to 4 GB of coher-ent memory. Each node also connects to the scalable XIOIO subsystem. This paper discusses the motivation forbuilding the Origin 200/2000 and describes its architec-ture and implementation. 1 Background Silicon Graphics has offered multiple generations ofsymmetric multiprocessor (SMP) systems based on MIPSmicroprocessors. From the 8 processor Power Series to the36 processor Challenge and Power Challenge systems, thecache-coherent, globally addressable memory architec-ture of these SMP systems has provided a convenient pro-gramming environment for large parallel applicationswhile at the same time providing an excellent system per-formance for both parallel and throughput-based work-loads.The follow-on system to the Power Challenge neededto meet three important goals. First, it needed to scale be-yond the 36 processor limit of the Power Challenge andprovide an infrastructure that supports higher performanceper processor. Second, the new system had to retain thecache-coherent globally addressable memory model of thePower Challenge. This model is critical for achieving highperformance on loop-level parallelized code and for sup-porting the existing Power Challenge customers. Finally,lower entry level and incremental system costs than ahigh-performance SMP were desired, with costs approach-ing that of a cluster of workstations.Simply building a larger and faster snoopy bus-basedSMP system could not meet all three of these goals. Theshared-memory goal might be achievable, but it wouldsurely compromise performance for larger processorcounts and costs for smaller configurations. Therefore avery different architecture was chosen for use in the nextgeneration Origin system. Origin employs distributedshared memory (DSM), with cache coherence maintainedvia a directory-based protocol. A DSM system has the potential for meeting all threedesign goals. The directory-based coherence removes thebroadcast bottleneck that prevents scalability of thesnoopy bus-based coherence. The globally addressablememory model is retained, although memory access timesare no longer uniform. To address this non-uniformity, Or-igin was designed to minimize the latency difference be-tween remote and local memory and to include hardwareand software support to insure that most memory referenc-es are local. Finally, a low initial and incremental cost canbe provided if the natural modularity of a DSM system isexploited at a relatively fine grain by the product design.In Section 2 of this paper, the scalable shared-memorymultiprocessing (S 2 MP) architecture of the Origin is pre-sented. Section 3 details the implementation of the Origin200/2000. Finally, Section 4 concludes the paper. 2 The Origin S 2 MP architecture A block diagram of the Origin architecture is shown inFigure 1. The basic building block of the Origin system isthe dual-processor node. In addition to the processors, aFigure 1 Origin block diagramScalable Interconnect NetworkMem&DirProc A Proc BIOXbarHubChipNode 0IO Ctrls Node1Node511 System Overview of the SGI Origin 200/2000 Product Line James Laudon and Daniel Lenoski Silicon Graphics, Inc.2011 North Shoreline BoulevardMountain View, California [email protected] [email protected] unit of computation is a “node”.6Thursday, October 4, 2007Node Architecture•Basic Desktop Module•4 node boards, 2 router boards, 12 XIO boards•larger systems built by combining modules•Node Board:•2x R10000 Processors•64 MB to 4 GB Memory•4 MB Cache / processor•Central Hub connects processors, IO, memory, network 5. At the bottom of the board are two R10000 processors with theirsecondary caches. The R10000 is a four-way out-of-order super-scalar processor[14]. Current Origin systems run the processor at195 MHz and contain 4 MB secondary caches. Each processor andits secondary cache is


View Full Document

UMD CMSC 714 - Shared Memory Architecture

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download Shared Memory Architecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Shared Memory Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Shared Memory Architecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?