UNO CSCI 8150 - Multiprocessors and Multicomputers

Unformatted text preview:

CSCI 8150 Advanced Computer ArchitectureMessage Passing in MulticomputersMessage FormatsStore and Forward RoutingFlits and Wormhole RoutingStore and Forward vs. WormholeAsynchronous PipeliningWormhole Node HandshakingAsynchronous Pipeline SpeedsLatencyVirtual ChannelsVirtual Channel ExampleDeadlockBuffer Deadlock in a Store and Forward NetworkChannel Deadlock with Wormhole RoutingFlow ControlPacket Collision ResolutionVirtual Cut-Through RoutingBlockingDiscardingDetourCollision Resolution TechniquesRoutingDeterministic Routing Using Dimension OrderingE-cube Routing on a HypercubeE-cube Routing ExampleE-Cube Routing Example (Detail)X-Y Routing on a 2-D MeshX-Y Routing ExampleDimension Ordering CharacteristicsAdaptive RoutingAdaptive Use of Virtual Channels to Avoid DeadlockCommunication PatternsEfficiency ParametersExample 5-Destination MulticastMulticast & Broadcast PatternsHypercube Multicast/BroadcastBroadcast & Multicast on HypercubeVirtual NetworksNetwork PartitioningCSCI 8150Advanced Computer ArchitectureHwang, Chapter 7Multiprocessors and Multicomputers7.4 Message Passing MechanismsMessage Passing in MulticomputersMulticomputers have no shared memory, and each “computer” consists of a single processor, cache, private memory, and I/O devices.Some “network” must be provided to allow the multiple computers to communicate.The communication between computers in a multicomputer is called “message passing.”Message FormatsMessages may be fixed or variable length.Messages are comprised of one or more packets.Packets are the basic units containing a destination address (e.g. processor number) for routing purposes.Different packets may arrive at the destination asynchronously, so they are sequence numbered to allow reassembly.Flits (flow control digits) are used in wormhole routing; they’re discussed a bit later Store and Forward RoutingPackets are the basic unit in the store and forward scheme.An intermediate node must receive a complete packet before it can be forwarded to the next node or the final destination, and only then if the output channel is free and the next node has available buffer space for the packet.The latency in store and format networks is directly related to the number of intermediate nodes through which the packet must pass.Flits and Wormhole RoutingWormhole routing divides a packet into smaller fixed-sized pieces called flits (flow control digits).The first flit in the packet must contain (at least) the destination address. Thus the size of a flit must be at least log2 N in an N-processor multicomputer.Each flit is transmitted as a separate entity, but all flits belonging to a single packet must be transmitted in sequence, one immediately after the other, in a pipeline through intermediate routers.Store and Forward vs. WormholeAsynchronous PipeliningEach intermediate node in a wormhole network, and the source and destination, each have a buffer capable of storing a flit.Adjacent nodes communicate requests and acknowledgements using a one-bit ready/request (R/A) line.When a receiver is ready, it pulls the R/A line low.When the sender is ready, it raises the R/A line high and transmits the next flit; the line is left high.After the receiver deals with the flit (perhaps sending it on to another node), it lowers the R/A line to indicate it is ready to accept another flit.The cycle repeats for transmission of other flits.Wormhole Node HandshakingAsynchronous Pipeline SpeedsAn asynchronous pipeline can be very efficient, and use a clock speed higher than that used in a synchronous pipeline.The pipeline can be stalled if buffers or successive channels in the path are not available during certain cycles.A packet could be “buffered, blocked, dragged, detoured” – and just knocked around, in general – if the pipeline stalls.LatencyAssumeD = # of intermediate nodes (routers) between the source and destinationL = packet length (in bits)F = flit length (in bits)W = the channel bandwidth (in bits/sec)Ignoring network startup time, propagation and resource delays:store and forward latency is L/W  (D+1), andwormhole latency is L/W + F/W  D.F is usually much smaller than L, and thus D has no significant effect on latency in wormhole systems.Virtual ChannelsThe channels between nodes in a wormhole-routed multicomputer are shared by many possible source and destination pairs.A “virtual channel” is a pair of flit buffers (in nodes) connected by a shared physical channel.The physical channel is “time shared” by all the virtual channels.Other resources (including the R/A line) must be replicated for each of the virtual channels.Virtual Channel ExampleDeadlockDeadlock can occur if it is impossible for any messages to move (without discarding one).Buffer deadlock occurs when all buffers are full in a store and forward network. This leads to a circular wait condition, each node waiting for space to receive the next message.Channel deadlock is similar, but will result if all channels around a circular path in a wormhole-based network are busy (recall that each “node” has a single buffer used for both input and output).Buffer Deadlock in a Store and Forward NetworkChannel Deadlock with Wormhole RoutingFlow ControlIf multiple packets/flits demand the same resources at a given node, then there must be some policy indicating how the conflict is to be resolved.These policies then determine what mechanisms can be used to deal with congestion and deadlock.Packet Collision ResolutionConsider the case of two flits both wanting to use the same channel or the same receive buffer at the same time.How is the “collision” resolved? Who gets the resource? What happens to the other flit?Virtual Cut-Through RoutingSolution: temporarily store one of the packets in a different buffer.Positive:No messages lostShould perform as well as wormhole with no conflictsNegative:Potentially large buffer required (with potentially large delays).Not suitable for routers.Cycles must be avoidedBlockingSolution: prevent one of the messages from advancing while the other uses the buffer/channel.Positive:Messages are not lost.NegativeNode sending blocked packet is idled.DiscardingSolution: drop one of the messages in contention for the buffer/channel.Positive:Simple to implementNegative:Loses messages, resulting in a severe waste of resources.DetourSolution: send the conflicting message somewhere (anywhere) else.Positive:Simple to implementNegative:May waste more channel


View Full Document

UNO CSCI 8150 - Multiprocessors and Multicomputers

Download Multiprocessors and Multicomputers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiprocessors and Multicomputers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiprocessors and Multicomputers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?