DOC PREVIEW
RIT EECC 756 - Scalable Distributed Memory Machines

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Scalable Distributed Memory MachinesMPPs Scalability IssuesOne Extreme: Limited Scaling of a BusAnother Extreme: Scaling of Workstations in a LAN?Bandwidth ScalabilityDancehall MP OrganizationGeneric Distributed Memory OrganizationKey System Scaling PropertyNetwork Latency ScalingNetwork Latency Scaling ExampleCost ScalingCost Effective?Parallel Machine Network ExamplesPhysical ScalingChip-level integration Example: nCUBE/2 Machine OrganizationChip-level integration Example: Vector Intelligent RAM 2 (V-IRAM-2)Chip-level integration Example: Alpha 21364Chip-level integration Example: A Possible Alpha 21364 SystemChip-level integration Example: IBM Power 4 CMPChip-level integration Example: IBM Power 4Chip-level integration Example: IBM Power 4 MCMBoard-level integration Example: CM-5 Machine OrganizationSystem Level Integration Example:Realizing Programming Models: Realized by ProtocolsChallenges in Realizing Prog. Models in Large-Scale MachinesNetwork Transaction ProcessingSpectrum of DesignsNo CA Net Transactions Interpretation: Physical DMAnCUBE/2 Network InterfaceDMA In Conventional LAN Network InterfacesUser-Level PortsUser-Level Network Example: CM-5User-Level HandlersiWARPDedicated Message Processing Without Specialized Hardware DesignLevels of Network TransactionExample: Intel ParagonMessage Processor EventsMessage Processor AssessmentEECC756 - ShaabanEECC756 - Shaaban#1 lec # 13 Spring2002 5-2-2002Scalable Distributed Scalable Distributed Memory MachinesMemory MachinesGoal: Parallel machines that can be scaled to hundreds or thousands of processors. •Design Choices:–Custom-designed or commodity nodes?–Network scalability. –Capability of node-to-network interface (critical).–Supporting programming models?•What does hardware scalability mean?–Avoids inherent design limits on resources.–Bandwidth increases with machine size P.–Latency should not increase with machine size P.–Cost should increase slowly with P.EECC756 - ShaabanEECC756 - Shaaban#2 lec # 13 Spring2002 5-2-2002MPPs Scalability IssuesMPPs Scalability Issues•Problems:–Memory-access latency.–Interprocess communication complexity or synchronization overhead.–Multi-cache inconsistency.–Message-passing and message processing overheads.•Possible Solutions:–Fast dedicated, proprietary and scalable, networks and protocols.–Low-latency fast synchronization techniques possibly hardware-assisted .–Hardware-assisted message processing in communication assists (node-to-network interfaces). –Weaker memory consistency models.–Scalable directory-based cache coherence protocols.–Shared virtual memory.–Improved software portability; standard parallel and distributed operating system support.–Software latency-hiding techniques.EECC756 - ShaabanEECC756 - Shaaban#3 lec # 13 Spring2002 5-2-2002One Extreme:One Extreme: Limited Scaling of a BusLimited Scaling of a Bus•Bus: Each level of the system design is grounded in the scaling limits at the layers below and assumptions of close coupling between components.Characteristic BusPhysical Length ~ 1 ftNumber of Connections fixedMaximum Bandwidth fixedInterface to Comm. medium memory infGlobal Order arbitrationProtection Virt -> physicalTrust totalOS singlecomm. abstraction HWPoor ScalabilityEECC756 - ShaabanEECC756 - Shaaban#4 lec # 13 Spring2002 5-2-2002Another Extreme:Another Extreme: ScalingScaling of Workstations in a LAN? of Workstations in a LAN?•No clear limit to physical scaling, no global order, consensus difficult to achieve.Characteristic Bus LANPhysical Length ~ 1 ft KMNumber of Connections fixed manyMaximum Bandwidth fixed ???Interface to Comm. medium memory inf peripheralGlobal Order arbitration ???Protection Virt -> physical OSTrust total noneOS single independentcomm. abstraction HW SWEECC756 - ShaabanEECC756 - Shaaban#5 lec # 13 Spring2002 5-2-2002•Depends largely on network characteristics:–Channel bandwidth.–Static: Topology: Node degree, Bisection width etc.–Multistage: Switch size and connection pattern properties.–Node-to-network interface capabilities.Bandwidth ScalabilityBandwidth ScalabilityP M M P M M P M M P M MSS S STypical switchesBusMultiplexersCrossbarEECC756 - ShaabanEECC756 - Shaaban#6 lec # 13 Spring2002 5-2-2002Dancehall MP OrganizationDancehall MP Organization•Network bandwidth?•Bandwidth demand?–Independent processes?–Communicating processes?•Latency?  Scalable networkP$SwitchMP$P$P$M M  Switch SwitchExtremely high demands on network in terms ofbandwidth, latency even forindependent processes.EECC756 - ShaabanEECC756 - Shaaban#7 lec # 13 Spring2002 5-2-2002Generic Distributed Memory Generic Distributed Memory OrganizationOrganization•Network bandwidth?•Bandwidth demand?–Independent processes?–Communicating processes?•Latency? O(log2P) increase?•Cost scalability of system?  Scalable networkCAP$SwitchMSwitchSwitchMulti-stageinterconnection network (MIN)?Custom-designed?Node:O(10) Bus-based SMPCustom-designed CPU?Node/System integration level?How far? Cray-on-a-Chip? SMP-on-a-Chip?OS Supported?Network protocols?Communication AssistExtent of functionality?MessagetransactionDMA?Global virtual Shared address space?EECC756 - ShaabanEECC756 - Shaaban#8 lec # 13 Spring2002 5-2-2002Key System Scaling PropertyKey System Scaling Property•Large number of independent communication paths between nodes. => Allow a large number of concurrent transactions using different channels.•Transactions are initiated independently.•No global arbitration.•Effect of a transaction only visible to the nodes involved–Effects propagated through additional transactions.EECC756 - ShaabanEECC756 - Shaaban#9 lec # 13 Spring2002 5-2-2002Network Latency ScalingNetwork Latency Scaling•T(n) = Overhead + Channel Time + Routing Delay•Scaling of overhead?•Channel Time(n) = n/B --- BW at bottleneck•RoutingDelay(h,n)EECC756 - ShaabanEECC756 - Shaaban#10 lec # 13 Spring2002 5-2-2002Network Latency Scaling ExampleNetwork Latency Scaling Example•Max distance: log2 n•Number of switches:  n log n•overhead = 1 us, BW = 64 MB/s, 200 ns per hop•Using pipelined or cut-through routing:•T64(128) = 1.0 us + 2.0 us + 6 hops * 0.2 us/hop = 4.2 us•T1024(128) = 1.0 us + 2.0 us + 10 hops * 0.2 us/hop = 5.0 us•Store and


View Full Document

RIT EECC 756 - Scalable Distributed Memory Machines

Documents in this Course
Load more
Download Scalable Distributed Memory Machines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Scalable Distributed Memory Machines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Scalable Distributed Memory Machines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?