DOC PREVIEW
Berkeley COMPSCI C267 - Lecture 5: More about Distributed Memory Computers and Programming

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267 Applications of Parallel Computers Lecture 5: More about Distributed Memory Computers and ProgrammingRecap of Last LectureOutlineHistory and TerminologyHistorical PerspectiveNetwork AnalogyComponents of a NetworkProperties of a NetworkTopologiesLinear and Ring TopologiesMeshes and ToriHypercubesTreesButterfliesEvolution of Distributed Memory MultiprocessorsPerformance ModelsPRAMLatency and BandwidthExample communication costsMore detailed performance model: LogPImplementing Message PassingImplementing Synchronous Message PassingExample: Permuting DataImplementing Asynchronous Message PassingSafe Asynchronous Message PassingExample Revisited: Permuting DataOther operations besides send/receiveExample: Sharks and Fish2 Algorithms for Gravity: What are their costs?More Algorithms for GravityCS267 L5 Distributed Memory.1Demmel Sp 1999CS 267 Applications of Parallel ComputersLecture 5: More aboutDistributed Memory Computersand Programming James Demmelhttp://www.cs.berkeley.edu/~demmel/cs267_Spr99CS267 L5 Distributed Memory.2Demmel Sp 1999Recap of Last Lecture°Shared memory processors •If there are caches then hardware must keep them coherent, i.e. with multiple cached copies of same location kept equal•Requires clever hardware (see CS258)•Distant memory much more expensive to access°Shared memory programming •Solaris Threads•Starting, stopping threads•Synchronization with barriers, locks•Sharks and Fish exampleCS267 L5 Distributed Memory.3Demmel Sp 1999Outline°Distributed Memory Architectures•Topologies•Cost models°Distributed Memory Programming•Send and Receive•Collective Communication°Sharks and Fish•GravityCS267 L5 Distributed Memory.4Demmel Sp 1999History and TerminologyCS267 L5 Distributed Memory.5Demmel Sp 1999Historical Perspective°Early machines were:•Collection of microprocessors•bi-directional queues between neighbors°Messages were forwarded by processors on path°Strong emphasis on topology in algorithmsCS267 L5 Distributed Memory.6Demmel Sp 1999Network Analogy°To have a large number of transfers occurring at once, you need a large number of distinct wires°Networks are like streets•link = street•switch = intersection•distances (hops) = number of blocks traveled•routing algorithm = travel plans°Properties•latency: how long to get somewhere in the network•bandwidth: how much data can be moved per unit time-limited by the number of wires-and the rate at which each wire can accept dataCS267 L5 Distributed Memory.7Demmel Sp 1999Components of a Network°Networks are characterized by°Topology - how things are connected•two types of nodes: hosts and switches°Routing algorithm - paths used•e.g., all east-west then all north-south (avoids deadlock)°Switching strategy•circuit switching: full path reserved for entire message -like the telephone•packet switching: message broken into separately-routed packets-like the post office°Flow control - what if there is congestion•if two or more messages attempt to use the same channel•may stall, move to buffers, reroute, discard, etc.CS267 L5 Distributed Memory.8Demmel Sp 1999Properties of a Network°Diameter is the maximum shortest path between two nodes in the graph.°A network is partitioned if some nodes cannot reach others.°The bandwidth of a link in the is: w * 1/t•w is the number of wires•t is the time per bit°Effective bandwidth lower due to packet overhead°Bisection bandwidth•sum of the minimum number of channels which, if removed, will partition the networkRouting and control headerData payloadError codeTrailerCS267 L5 Distributed Memory.9Demmel Sp 1999Topologies°Originally much research in mapping algorithms to topologies°Cost to be minimized was number of “hops” = communication steps along individual wires°Modern networks use similar topologies, but hide hop cost, so algorithm design easier•changing interconnection networks no longer changes algorithms°Since some algorithms have “natural topologies”, still worth knowingCS267 L5 Distributed Memory.10Demmel Sp 1999Linear and Ring Topologies°Linear array•diameter is n-1, average distance ~2/3n•bisection bandwidth is 1°Torus or Ring•diameter is n/2, average distance is n/3•bisection bandwidth is 2°Used in algorithms with 1D arraysCS267 L5 Distributed Memory.11Demmel Sp 1999Meshes and Tori °2D•Diameter: 2 * n•Bisection bandwidth: n 2D mesh 2D torus° Often used as network in machines° Generalizes to higher dimensions (Cray T3D used 3D Torus)° Natural for algorithms with 2D, 3D arraysCS267 L5 Distributed Memory.12Demmel Sp 1999Hypercubes°Number of nodes n = 2d for dimension d•Diameter: d •Bisection bandwidth is n/2° 0d 1d 2d 3d 4d°Popular in early machines (Intel iPSC, NCUBE)•Lots of clever algorithms •See 1996 notes°Greycode addressing•each node connected to d others with 1 bit different 001000100010011111101110CS267 L5 Distributed Memory.13Demmel Sp 1999Trees°Diameter: log n°Bisection bandwidth: 1°Easy layout as planar graph°Many tree algorithms (summation)°Fat trees avoid bisection bandwidth problem•more (or wider) links near top•example, Thinking Machines CM-5CS267 L5 Distributed Memory.14Demmel Sp 1999Butterflies°Butterfly building block°Diameter: log n°Bisection bandwidth: n°Cost: lots of wires°Use in BBN Butterfly°Natural for FFTO 1O 1O 1 O 1CS267 L5 Distributed Memory.15Demmel Sp 1999Evolution of Distributed Memory Multiprocessors°Direct queue connections replaced by DMA (direct memory access)•Processor packs or copies messages•Initiates transfer, goes on computing°Message passing libraries provide store-and-forward abstraction•can send/receive between any pair of nodes, not just along one wire•Time proportional to distance since each processor along path must participate°Wormhole routing in hardware•special message processors do not interrupt main processors along path•message sends are pipelined•don’t wait for complete message before forwardingCS267 L5 Distributed Memory.16Demmel Sp 1999Performance ModelsCS267 L5 Distributed Memory.17Demmel Sp 1999PRAM°Parallel Random Access Memory°All memory access free•Theoretical, “too good to be true”°OK for understanding whether an algorithm has enough parallelism at all°Slightly more realistic:•Concurrent Read Exclusive Write (CREW) PRAMCS267 L5


View Full Document

Berkeley COMPSCI C267 - Lecture 5: More about Distributed Memory Computers and Programming

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Lecture 5: More about Distributed Memory Computers and Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 5: More about Distributed Memory Computers and Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5: More about Distributed Memory Computers and Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?