Berkeley COMPSCI 258 - Code Generation for Process Network Models onto Parallel Architectures - D514330

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 258> Code Generation for Process Network Models onto Parallel Architectures

Berkeley COMPSCI 258 - Code Generation for Process Network Models onto Parallel Architectures

School name University of California, Berkeley

Course Compsci 258- Parallel Processors

Pages 10

Download Save

Unformatted text preview:

Code Generation for Process Network Models onto Parallel ArchitecturesMan-kit Leung, Isaac Liu, and Jia ZouCenter for Hybrid and Embedded Software Systems, EECSUniversity of California, BerkeleyBerkeley, CA 94720, USA{mankit, liuisaac, jiazou}@eecs.berkeley.eduAbstractWith multi-core and many-core architectures becomingthe current focus of research and development, and as vastvarieties of architectures and programming models emerg-ing in research, the design space for applications is becom-ing enormous. From the number of cores, the memory hi-erarchy, the interconnect to even the programming modeland language used are all design choices that need to beoptimized for applications in order to fully benefit from par-allel architectures. We propose a code generation frame-work targeting rapid design space exploration and proto-typing. From the high level design, code for specific ar-chitectures and mappings can be generated and used forcomparison. We choose Khan Process Networks[11] as ourcurrent specification language, because of its inherit paral-lelism and expressiveness. Our code generator take advan-tage of Message Passing Interface (MPI) [6] as the API forimplementing message passing across platforms. We showthe scalability of the generated MPI code and the ability toextend our framework to allow for tuning and optimization.1 IntroductionThe shift from single-core sequential code tomulti/many-core parallel code has not been as intu-itive as one would have hoped. Simply running a programon a parallel architecture will not necessarily yield aperformance increase. In some cases, it might even degradeperformance due to the overhead created by the parallelarchitectures. Thus, efforts such as the Berkeley ParallelComputing Laboratory (ParLab) [1] have gathered scholarswith different areas of expertise to help with the transitionto parallel computing. From the underlying architecture tothe parallelizing of applications, all levels of abstractionare being rethought and redeveloped. While the efforts andresults of researchers and academia have been promising,the wide range of methods and solutions delivered createan enormous design space for end users.Programming in parallel itself is already a daunting task.Much effort is required to insure correctness and prove theprogram to be dead-lock free, not to mention additional tun-ing and optimizing for performance. However, even be-fore any programming can be done, the underlying archi-tecture must be decided. From the number of cores, thememory hierarchy, to the inter-connection network usedare all application specific parameters that need to be op-timized. Choosing the right mix often requires extensiveresearch and time, which leads to slower product develop-ment cycles. To allow more rapid prototyping and develop-ment, we build upon the design methodology of “Correct-by-Construction” proposed by Dijkstra [4]. Dijkstra statesthat if a series of mathematically-correct transformationsare applied to a mathematically-correct model, then the re-sulting transformation is also mathematically-correct. In thesame way, designers would first construct higher level mod-els to ensure and prove correctness of their design. Then,transform the higher level model to actual implementation.That transformation is code generation.We propose a code generation framework that generatesparallel code targeting different platforms from higher levelspecifications which allow for quick development and pro-totyping of parallel applications. This framework will allowusers to parametrize several design choices such as numberof cores, targeting library, and partitioning of the applica-tion to quickly generate executable parallel code for com-parison and tuning. We further extend this framework to in-sert profiling and feedback code into the generated program.This allows users to obtain execution trace and statistics,which can be fed back to the code generator to further tuneand optimize the to produce better code. We implementedthis code generation framework on top of the Ptolemy IIproject, which is a heterogeneous modeling and simulationenvironment designed to allow users to explore high levelmodels of computations[3]. Currently, a MPI code gen-eration engine has been implemented and able to generateMPI code from Process Network models. Our results showlow overhead when comparing to the current pthreads im-plementation used for Process Network models.The following sections describe our work. First we giveour work context in terms of other research in the samearea. Then we will give some background information onthe languages and framework we used. Following, we givean explanation of our code generation framework, includ-ing a work flow of our code generator. We further explainthe implementation details of the generator, and finally con-clude with some testing results and conclusion.2 Related workPrior work in [19] has been done to generate code for Ac-tive Messages(AM), which is a lower-level mechanism thatcan be used to implement data parallel or message passingefficiently. Due to the fact AM is a communication primi-tive, the functionalities supported by it is very limited com-pared to that of MPI, which is built upon AM. Thus one wayto looked at it is that AM’s functionality is a subset of thatof MPI. Also in this work by Warner, the generated sched-uler is of Synchronous Dataflow(SDF) semantics. SDF isa special case of Process Networks, where the firings of allactors could be scheduled at compile time. However being aspecial case of Process Networks, SDF also has less expres-siveness, meaning that some models that could be modeledby Process Networks will not be modeled by SDF, but notthe other way around.Another work [15] also tries to do code generation formultiprocessor platforms. Like the last one, it also focuseson system modeled by SDF model of computation. Also,this work actually require a set of send and receive actors.This means every time a specific partition is made, theseactors needs to be inserted. Our work does not have thisrestriction, where changing the partition does not require usto manually change the model itself. Rather, the communi-cation between processors are indicated by port attributes.[17] is also of important relevance to us. In this work,the authors still focused on SDF model of computation.SDF provides edge and node weights in a very delicateway, where a acyclic precedence graph could be constructedfrom the model, and the node and

View Full Document