Unformatted text preview:

Multiprocessor Interconnection Networks Networks How do we move data between processors Design Options Topology Todd C Mowry CS 740 November 3 2000 Routing Physical implementation Topics Network design issues Network Topology 2 CS 740 F 00 Evaluation Criteria Buses Latency P P P Bisection Bandwidth Contention and hot spot behavior Bus Partitionability Simple and cost effective for small scale multiprocessors Cost and scalability Not scalable limited bandwidth electrical complications Fault tolerance 3 CS 740 F 00 4 Page 1 CS 740 F 00 Crossbars Rings Crossbar P Each port has link to every other port Cheap Cost is O N P P Low latency and high throughput Point to point wires and pipelining can be used to make them very fast P M M M M Cost grows as O N 2 so not very scalable High overall bandwidth Difficult to arbitrate and to get all data lines into and out of a centralized crossbar P Ring Examples KSR machine Hector CS 740 F 00 P P 6 P CS 740 F 00 Trees Cheap Cost is O N P High latency O N Used in small scale MPs e g C mmp and as building block for other networks e g Omega 5 P Hypercubes H Tree Latency is O logN Easy to layout as planar graphs e g H Trees 0 D 1 D 2 D 3 D 4 D For random permutations root can become bottleneck Also called binary n cubes of nodes N 2 n To avoid root being bottleneck notion of Fat Trees used in CM5 Latency is O logN Out degree of PE is O logN Minimizes hops good bisection BW but tough to layout in 3 space Popular in early message passing computers e g intel iPSC NCUBE Used as direct network emphasizes locality Fat Tree 7 CS 740 F 00 8 Page 2 CS 740 F 00 Multistage Logarithmic Networks Omega Network Om ega Net w or k 0 00 Key Idea have multiple layers of switches between destinations Cost is O NlogN latency is O logN throughput is O N 0 01 0 10 0 11 0 10 0 11 1 00 Generally indirect networks Many variations exist Omega Butterfly Benes 1 00 1 01 1 01 1 10 1 11 1 10 1 11 All stages are same so can use recirculating network Single path from source to destination Can add extra stages and pathways to minimize collisions and increase fault tolerance Can support combining Used in IBM RP3 Used in many machines BBN Butterfly IBM RP3 9 0 00 0 01 CS 740 F 00 10 Butterfly Network CS 740 F 00 k ary n cubes But t er f ly Net w or k 0 00 0 00 0 01 0 01 0 10 0 11 0 10 0 11 1 00 1 00 1 01 1 01 4 ary 3 cube 1 10 1 11 1 10 1 11 s plit on MS B s plit on LS B Equivalent to Omega network Easy to see routing of messages Generalization of hypercubes k nodes in a string Total of nodes N k n k 2 reduces of channels at bisection thus allowing for wider channels but more hops Also very similar to hypercubes direct vs indirect though Clearly see that bisection of network is N 2 channels Can use higher degree switches to reduce depth 11 CS 740 F 00 12 Page 3 CS 740 F 00 Real World 2D mesh Advantages of Low Dimensional Nets What can be built in VLSI is often wire limited LDNs are easier to layout more uniform wiring density easier to embed in 2 D or 3 D space mostly local connections e g grids Compared with HDNs e g hypercubes LDNs have shorter wires reduces hop latency fewer wires increases bandwidth given constant bisection width increased channel width is the major reason why LDNs win 1824 node Paragon 16 x 114 array LDNs have better hot spot throughput more pins per node than HDNs 13 CS 740 F 00 14 Embeddings in two dimensions CS 740 F 00 Routing Recall routing algorithm determines which of the possible paths are used as routes how the route is determined R N x N C which at each switch maps the destination node nd to the next channel on the route Issues Routing mechanism arithmetic source based port select table driven general computation Properties of the routes Deadlock free 6x3x2 Embed multiple logical dimension in one physical dimension using long wires 15 CS 740 F 00 16 Page 4 CS 740 F 00 Routing Mechanism Routing Mechanism cont need to select output port for each input packet P3 Reduce relative address of each dimension in order Dimension order routing in k ary d cubes e cube routing in n cube P0 message header carries series of port selects used and stripped en route CRC Packet Format CS 2 Myrinet MIT Artic Table driven message header carried index for next port at next switch o R i table also gives index for following hop o I R i ATM HPPI CS 740 F 00 18 Properties of Routing Algorithms CS 740 F 00 Deadlock Freedom How can it arise Deterministic necessary conditions shared resource incrementally allocated non preemptible think of a channel as a shared resource that is acquired incrementally source buffer then dest buffer channels along a route route determined by source dest not intermediate state i e traffic Adaptive route influenced by traffic along the way Minimal only selects shortest paths How do you avoid it Deadlock free constrain how channel resources are allocated ex dimension order no traffic pattern can lead to a situation where no packets mover forward 19 P1 Source based in a few cycles 17 P2 How do you prove that a routing algorithm is deadlock free CS 740 F 00 20 Page 5 CS 740 F 00 Proof Technique Examples Why is the obvious routing on X deadlock free Resources are logically associated with channels Messages introduce dependences between resources as they move forward Need to articulate possible dependences between channels Show that there are no cycles in Channel Dependence Graph butterfly tree fat tree Any assumptions about routing mechanism amount of buffering What about wormhole routing on a ring find a numbering of channel resources such that every legal route follows a monotonic sequence no traffic pattern can lead to deadlock 2 1 Network need not be acyclic only channel dependence graph 0 3 7 4 5 21 CS 740 F 00 22 Page 6 6 CS 740 F 00


View Full Document

CMU CS 15740 - Multiprocessor Interconnection Networks

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view Multiprocessor Interconnection Networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiprocessor Interconnection Networks and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?