Gregory hl. Papadopoulos Laboratory for Computer Science Masachusetts Institute of Technology Abstract Dataflow architectures tolerate long unpredictable com- munication delays and support generation and coordi- nation of parallel activities directly in hardware, rather than assumiug that program mapping will cause these issues to disappear. However, the proposed mecha- nisms are complex and introduce new mapping com- plications. This paper presents a greatly simplified ag preach to dataflow execution, called the ezplicil ioken store (ETS) architecture, and its current realization in Monsoon. The essence of dynamic dataflow execution is captured by a simple transition on state bits associ- ated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an acliunfion frame, rather than dy- namically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann archi- tecture. Although the addressing capability is restric- tive, there is exactly one instruction executed for each action on the dataflow graph. Thus, the machine ori- ented ETS model provides new understanding of the merits and the real cost of direct execution of dataflow graphs. 1 Introduction The Explicit Token Store (ETS) architecture is an un- usually simple model of dynamic dataflow esecution that is realized in Monsoon, a large-scale dataflow multiprocessor[27]. A Monsoon processor prototype is operational at the MIT Laboratory for Computer Science, running large programs compiled from the dataflow language Id. A full-scale multiprocessor sys- tem is under development[6] in conjunction with Mo- torola Inc. Formulation of the ETS architecture be- gan in 1966 as an outgrowth of the MIT Tagged-Token Dataflow Architecture (TTDA) and was based on a family of design goals - some evolutionary, some rev- olutionary, and some reactionary. The fundamental properties of the TTDA that we wanted to retain included a large synchronization namespace, inexpensive synchronization, and tolerance to memory and commuuication latency(91. These prop erties do not improve peak performance, but they dic- tate how much of it is actually delivered on com- plex applications[5]. To obtain high performance on CH2887-8/90/0000/0082$01 .OO Q 1990 IEEE Monsoon: an Explicit Token-Store Architecture David E. Culler Computer Science Division University of California, Berkeley a parallel machine lacking these features, a program must be partitioned into a small number of processes that operate almost entirely on local data and rarely interact[10,21]. H owever, if highly parallel machines are to be used in solving problems more complex than what is addressed on current supercomputers, it is likely they will have to be programmed in a very high level language with little explicit programmer management of parallelism[7,8], and the behavior of these complex applications may be very dynamic. Together, these observations suggest that we cannot expect to achieve a perfect mapping for many applications that we will want to execute on a highly parallel machine, so we are studying the design of processors that tolerate an imperfect mapping and still obtain high performance. Tagged-token dataflow machines achieve this tol- erance through sophisticated matching hardware, which dynamically schedules operations with available operands[2,29,29]. When a token arrives at a processor, the tag it carries is checked against the tags present in the token-store. If a matching token is found, it is estracted and the corresponding instruction is enabled for execution; otherwise, the incoming token is added to the store. This allows for a simple non-blocking procea- sor pipeline that can overlap instructions from closely related or completely unrelated computations. It also provides a graceful means of integrating asynchronous, perhaps misordered, memory responses into the nor- mal flow of execution. However, the matching oper- ation involves considerable complexity on the critical path of instruction scheduling[lS]. Although progress has been made in matching hardware[20,29], our goal was to achieve the benefits of matching with a funda- mentally simpler mechanism. The more subtle problem with the matching paradigm is that failure to find a match implicifly allo- cates resources within the token store. Thus, in map ping a portion of the computation to a processor, an unspecified commitment is placed on the token store of that processor and, if this resource becomes overcom- mitted, the program may deadlock[l]. If the match is to be performed rapidly, WC cannot assume this resource is so plentiful that it can be wasted. The worst+ase token storage requirement can be determined on a per- code-block basis with sophisticated compiler analysis, but the “bottom line” is that this complex mechanism fails to simplify resource management. Thus, engineer- ing and management concerns led us to consider how to make the token-store ezplicil in the dataflow model. 398The result is a simple architecture that directly exe- cutes dataflow graphs, yet can be understood as a vari- ation on a (fairly primitive) von Neumann machine. It is simple enough to build and serves to clarify many aspects of dataflow execution. The sequel describes the ETS architecture and its re- alization in Monsoon. Section 2 outlines the basic exe- cution model. Section 3 describes the hlonsoon imple- mentation. Sections 4 and 5 provide preliminary mea- surements of programs on this machine and reflections on the evolution of dataAow architectures. 2 ETS Architecture The central idea in the ETS model is that storage for tokens is dynamically allocated in sizable blocks, with detailed
View Full Document