PARALLEL ALGORITHM DEVELOPMENT FOR A SPECIFIC MACHINE INMOS T800 TRANSPUTER BY WILLIAM STAUB 1 This paper will desribe from start to finish the parallel algorithm program development for a specific transputer Inmos T800 network It is essential to understand how the hardware manages parallel tasks and how information is exchanged between the transputers before writing a parallel program A transputer figure1 page3 is a circuit containing a processor some memory to store programs and data and several ports for exchanging or transferring information with other transputers or with the outside world By designing these circuits so that they could be connected together with the same simplicity with which transistors can be in a computer the transputer was born One of the most important factor was the introduction of a high level language occam MAY83 whose features were directly supported by this transputer s hardware and that made the transputer a building block for parallel computers However a Locical System C compiler was developed for wider know usage A prominent factor to utilizing this circuitry was the ease with which transputers could be connected to each other with as little as a few electrical wires The four bidirectional input output I O ports of the transputer are designed to interface directly with the ports of other transputers his feature allows for several transputers to fit on a small footprint with very little extra logic circuits making it possible to easily fit four transputers with some memory on a PC daughter board ISA bus 2 Figure 1 Transputer block diagram and examples of interconnection networks Figure 1 shows the basic block diagram of a transputer along with some interconnection schemes a linear chain a offering the simplest of connections a ring b and a mesh c The processor does not have dedicated registers but a stack of registers which allows for an implicit selection of the registers The net result is a smaller instruction format which leads to a denser program code The transputer adopts the RISC philosophy and supports a small set of instructions executed in a few cycles each Multitasking is supported in microcode within the processor The actions necessary for the transputer to swap from one task to another are executed at the hardware level freeing the system programmer of this task and resulting in fast swap operations This is an important way to improve the performance of a processor by allowing it to start running another program if the one that it is currently executing cannot continue for awhile Multitasking is the first form of parallelism supported by the 3 transputer and is accomplished by having the processor maintain a list of tasks that must be executed Each task executes for a small amount of time called a quantum before it is stopped and swapped for another task if one awaits For the transputer a quantum of time is typically on the order of two milliseconds 2 10 3 seconds 1 Tasks are executed in a round robin fashion When a task eventually terminates it is removed from the list The transputer maintains the active tasks chained in a linked list and two of its internal registers are used to point to the front and rear of the list The actual list is stored in memory and the registers contain the memory address of the cells defining the tasks At any time new tasks may be created and added to this list A transputer task may be in either one of two states Active This state refers to a task that is being executed or in the list of tasks waiting to be executed Inactive This state refers to a task that is not in the list of active tasks because either one of three conditions is preventing it from continuing execution The task is waiting for an input from one of the I O ports the task is waiting to output to one of the I O ports or the task has been asked to stay idle until a specified time in the future 4 As far as RAM the transputer can access a linear address space of 4 Gbytes Of these 4 Gbytes 4 Kbytes are built in the T805 transputer circuit and correspond to the lower part of the memory address space Since the registers are 32 bits in length the transputer accesses 4 bytes at a time when accessing memory The combination of the architecture of the serial I O ports and the way the transputer manages them contribute to making the transputer a unique circuit especially well tailored for multiprocessing I O ports are memory mapped This means that programming a port and passing it the address and length of a message is done by writing these numbers to memory locations that are mapped to the registers of the port The four bi directional serial comprise two wires When two transputers are connected together they exchange information through one of their links The processes or tasks running on each of the transputers are then free to communicate by exchanging data or messages over the link The term link is used to refer to the physical connection between two transputers and the term channel to is used to describe the software connection between the two processes The transfer of data over a serial link is synchronized and unbuffered Synchronization refers to the fact that if Process P1 running on Transputer T1 needs to exchange send or receive information with Process P2 on Transputer T2 then it must wait until P2 is ready to participate in the exchange The processes are said to be synchronized since the sender cannot get rid of its data until the receiver is ready to accept them and conversely Unbuffered communication means that no temporary storage is needed to store incoming or outgoing messages The transfer takes place directly between 5 the memory of the sender and the memory of the receiver The I O port is a Direct Memory Access Controller DMAC which is given the memory address of the first byte of the incoming or outgoing message and the number of bytes contained in that message Once initialized the link is autonomous and will wait for its partner to signal its readiness to participate in the communication During the communication the processes that initiated the transfer are blocked Each process is placed at the rear of the list of inactive tasks Because the processor and the links operate independently the processor is free to run another process when one is blocked by communication As a result the channels require no message queues or message buffers Data are transferred one byte at a time each byte generating an acknowledge from the receiving
View Full Document
Unlocking...