Unformatted text preview:

On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networksIntroductionRelated workPreliminariesArchitecture and power modelsAssumptions and definitionsIrregular sub-network oriented multicast routingMotivation example and irregular sub-network oriented multicasting strategyHardware-based multicast routing algorithm for irregular sub-networksHardware costPerformance evaluationExperiment settingsReal applicationsRandom benchmarks with uniform trafficConclusionAcknowledgementReferencesOn an efficient NoC multicasting scheme in support of multiple applicationsrunning on irregular sub-networksXiaohang Wanga,b, Mei Yangb, Yingtao Jiangb, Peng Liua,⇑aDepartment of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, PR ChinabDepartment of Electrical and Computer Engineering, University of Nevada, Las Vegas 89154, USAarticle infoArticle history:Available online 17 August 2010Keywords:Network-on-chips (NoCs)Chip multiprocessor (CMP)MulticastRoutingabstractWhen a number of applications simultaneously running on a many-core chip multiprocessor (CMP) chipconnected through network-on-chip (NoC), significant amount of on-chip traffic is one-to-many (multi-cast) in nature. As a matter of fact, when multiple applications are mapped onto an NoC architecture withapplicable traffic isolation constraints, the corresponding sub-networks of these applications are mappedonto actually tend to be irregular. In the literature, multicasting for irregular topologies is supportedthrough either multiple unicasting or broadcasting, which, unfortunately, results in overly high powerconsumption and/or long network latency. To address this problem, a simple, yet efficient hardware-based multicasting scheme is proposed in this paper. First, an irregular oriented multicast strategy is pro-posed. Literally, following this strategy, an irregular oriented multicast routing algorithm can be designedbased on any regular mesh based multicast routing algorithm. One such algorithm, namely, AlternativeRecursive Partitioning Multicasting (AL + RPM), is proposed based on RPM, which was designed for reg-ular mesh topology originally. The basic idea of AL + RPM is to find the output directions following thebasic RPM algorithm and then decide to replicate the packets to the original output directions or thealternative (AL) output directions based on the shape of the sub-network. The experiment results showthat the proposed multicast AL + RPM algorithm can consume, on average, 14% and 20% less power thanbLBDR (a broadcasting-based routing algorithm) and the multiple unicast scheme, respectively. In addi-tion, AL + RPM has much lower network latency than the above two approaches. To incorporate AL + RPMinto a baseline router to support multicasting, the area overhead is fairly modest, less than 5.5%.Ó 2010 Elsevier B.V. All rights reserved.1. IntroductionAdvance in technology continues to drive the increase of tran-sistor integration capacity. It is estimated that by 2015, there willbe 100 billion transistors integrated on a 300 mm2die [1]. To ex-ploit this large number transistors and also take into considerationof pressing high power consumption of ever bigger chips, the de-sign paradigm is migrating to many-core architectures [1,2]. Net-work-on-chip (NoC) [3] has been proposed as the mainstreamon-chip network architecture to efficiently interconnect the largenumber of (16 or more) processing cores integrated on a many-core system. Some most recent, high profile examples includeIntel’s Teraflop [4] and Tilera [5] chips featuring many-core chipmultiprocessors (CMPs) architectures with 2D mesh topologies[13] for on-chip interconnect.With the development of diverse applications and program-ming models on CMPs, one-to-many communication and one-to-all communication are becoming more common. For example, inCMPs with cache coherent shared memory systems, the cachecoherence protocols exhibit one-to-many communication charac-teristics to keep the ordering of different requests or to invalidateshared data on different cache nodes [6].In[7], it has been ob-served that 5–10% of the network traffic is one-to-many in nature,ranging from scientific workloads to commercial workloads, incommunication traces of different cache coherence protocols andoperand network. Therefore, efficient support of one-to-manycommunications in CMPs, particularly hardware multicast support,will benefit a wide range of applications by boosting the networkperformance with reduced power consumption. Unfortunately,up to date, there is only very limited number of chip router designsthat actually support multicasting [6–8].In addition, the following issues make multicast supportingeven more complicated. The first issue is topology irregularity.The large number of cores on a CMP unquestionably offers highparallelism in computation. To better utilize these vastly availablecomputation resources, virtualization of the chip becomes a0141-9331/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.micpro.2010.08.003⇑Corresponding author at: Department of Information Science and ElectronicEngineering, Zhejiang University, Hangzhou, Zhejiang 310027, PR China.E-mail addresses: [email protected] (X. Wang), [email protected](M. Yang), [email protected] (Y. Jiang), [email protected] (P. Liu).Microprocessors and Microsystems 35 (2011) 119–129Contents lists available at ScienceDirectMicroprocessors and Microsystemsjournal homepage: www.elsevier.com/locate/micpronecessity [9], where resources can be distributed among differentvirtual machines [3]. Applying virtualization [8] at the NoC levelbasically allows a single NoC-based CMP to be shared by multipleapplications with each mapped to different sub-networks of thechip [10] either statically [11] or dynamically [12]. Fig. 1 showsan example with three applications arriving at 1 ms, 2 ms, and3 ms. The three applications are allocated to three sub-networkswhich may not be regular shapes (e.g., 2D mesh, torus). On theother hand, virtualization requires traffic isolation [8]; that is, com-munication between nodes in a virtualized region is limited to thesub-network only. The irregular sub-network and traffic isolationrequirements together negate regular 2D mesh oriented routingalgorithms, like XY routing, odd–even routing, etc. [13].The second issue is unpredictability of the


View Full Document
Download Microprocessors and Microsystems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Microprocessors and Microsystems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Microprocessors and Microsystems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?