DOC PREVIEW
Berkeley COMPSCI C267 - Distributed Memory Computers

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS267 Lecture 2 12/11/2004 CS267 Lecure 7 1CS 267:Distributed Memory Computers Kathy Yelickhttp://www.cs.berkeley.edu/~yelick/cs2672/11/2004 CS267 Lecure 7 2Recap of Last Lectures• Shared memory multiprocessors• Caches in individual processors must be kept coherent -- multiple cached copies of same location must be kept equal.• Requires clever hardware (see CS258).• Distant memory much more expensive to access.• Shared memory programming • Starting, stopping threads.• Synchronization with barriers, locks.• Distributed memory programming• MPI (message passing interface)• Lecture Wednesday by Bill Saphir from LBNL2/11/2004 CS267 Lecure 7 3Outline• Distributed Memory Architectures• Topologies• Cost models• Trends in High End Machines• Clusters today• Top500 data2/11/2004 CS267 Lecure 7 4Historical Perspective• Early machines were:• Collection of microprocessors.• Communication was performed using bi-directional queues between nearest neighbors.• Messages were forwarded by processors on path.• “Store and forward” networking• There was a strong emphasis on topology in algorithms, in order to minimize the number of hops.2/11/2004 CS267 Lecure 7 5Network Analogy• To have a large number of transfers occurring at once, you need a large number of distinct wires.• Networks are like streets:•Link= street.•Switch= intersection.• Distances (hops) = number of blocks traveled.• Routing algorithm = travel plan.• Properties:• Latency: how long to get between nodes in the network.• Bandwidth: how much data can be moved per unit time.• Bandwidth is limited by the number of wires and the rate at which each wire can accept data.2/11/2004 CS267 Lecure 7 6Characteristics of a Network• Topology (how things are connected)• Crossbar, ring, 2-D and 2-D torus, hypercube, omega network.• Routing algorithm:• Example: all east-west then all north-south (avoids deadlock).• Switching strategy:• Circuit switching: full path reserved for entire message, like the telephone.• Packet switching: message broken into separately-routed packets, like the post office. • Flow control (what if there is congestion):• Stall, store data temporarily in buffers, re-route data to other nodes, tell source node to temporarily halt, discard, etc.CS267 Lecture 2 22/11/2004 CS267 Lecure 7 7Properties of a Network: Latency•Diameter: the maximum (over all pairs of nodes) of the shortest path between a given pair of nodes.• Latency: delay between send and receive times• Latency tends to vary widely across architectures• Vendors often report hardware latencies (wire time)• Application programmers care about software latencies (user program to user program)• Observations:• Hardware/software latencies often differ by 1-2 orders of magnitude• Maximum hardware latency varies with diameter, but the variation in software latency is usually negligible• Latency is important for programs with many small messages2/11/2004 CS267 Lecure 7 8Properties of a Network: Bandwidth• A network is partitioned into two or more disjoint sub-graphs if some nodes cannot reach others.• The bandwidth of a link = w * 1/t• w is the number of wires• t is the time per bit• Bandwidth typically in Gigabytes (GB), i.e., 8* 220bits• Effective bandwidth is usually lower than physical link bandwidth due to packet overhead.• Bandwidth is important for applications with mostly large messagesRouting and control headerData payloadError codeTrailerUnidirectional: in one directionBidirectional: in both directions2/11/2004 CS267 Lecure 7 9Properties of a Network: Bisection Bandwidth:• Bisection bandwidth: bandwidth across smallest cut that divides network into two equal halves• Bandwidth across “narrowest” part of the networkbisection cutnot a bisectioncut bisection bw= link bw bisection bw = sqrt(n) * link bw• Bisection bandwidth is important for algorithms in which all processors need to communicate with all others2/11/2004 CS267 Lecure 7 10Network Topology• In the past, there was considerable research in network topology and in mapping algorithms to topology.• Key cost to be minimized: number of “hops” between nodes (e.g. “store and forward”)• Modern networks hide hop cost (i.e., “wormhole routing”), so topology is no longer a major factor in algorithm performance.• Example: On IBM SP system, hardware latency varies from 0.5 usec to 1.5 usec, but user-level message passing latency is roughly 36 usec.• Need some background in network topology• Algorithms may have a communication topology• Topology affects bisection bandwidth.2/11/2004 CS267 Lecure 7 11Linear and Ring Topologies• Linear array• Diameter = n-1; average distance ~n/3.• Bisection bandwidth = 1 (units are link bandwidth).• Torus or Ring• Diameter = n/2; average distance ~ n/4.• Bisection bandwidth = 2.• Natural for algorithms that work with 1D arrays.2/11/2004 CS267 Lecure 7 12Meshes and Tori Two dimensional mesh • Diameter = 2 * (sqrt( n ) – 1)• Bisection bandwidth = sqrt(n)• Generalizes to higher dimensions (Cray T3D used 3D Torus).• Natural for algorithms that work with 2D and/or 3D arrays.Two dimensional torus• Diameter = sqrt( n )• Bisection bandwidth = 2* sqrt(n)CS267 Lecture 2 32/11/2004 CS267 Lecure 7 13Hypercubes• Number of nodes n = 2d for dimension d.• Diameter = d. • Bisection bandwidth = n/2.• 0d 1d 2d 3d 4d• Popular in early machines (Intel iPSC, NCUBE).• Lots of clever algorithms. • See 1996 online 267 notes.• Greycode addressing:• Each node connected to d others with 1 bit different. 0010001000100111111011102/11/2004 CS267 Lecure 7 14Trees• Diameter = log n.• Bisection bandwidth = 1.• Easy layout as planar graph.• Many tree algorithms (e.g., summation).• Fat trees avoid bisection bandwidth problem:• More (or wider) links near top.• Example: Thinking Machines CM-5.2/11/2004 CS267 Lecure 7 15Butterflies• Diameter = log n.• Bisection bandwidth = n.• Cost: lots of wires.• Used in BBN Butterfly.• Natural for FFT.O 1O 1O 1 O 1butterfly switchmultistage butterfly network2/11/2004 CS267 Lecure 7 16Topologies in Real MachinesButterflyBBN Butterfly (really old)Fat tree (approx)IBM SPArbitraryMyricom (Millennium)HypercubeSGI OriginFat treeQuadrics (in HP Alpha server clusters)2D MeshIntel Paragon (old)4D Hypercube*Cray X1Fat


View Full Document

Berkeley COMPSCI C267 - Distributed Memory Computers

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Distributed Memory Computers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Memory Computers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Memory Computers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?