UNLV ECG 702 - Microprocessors and Microsystems - D1673243

Home> Schools> University of Nevada, Las Vegas> Electrical And Computer Engineering (ECG) > ECG 702> Microprocessors and Microsystems

UNLV ECG 702 - Microprocessors and Microsystems

School name University of Nevada, Las Vegas

Course Ecg 702- Interconnection Networks for Parallel Processing Applications

Pages 11

Download Save

Unformatted text preview:

On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networksIntroductionRelated workPreliminariesArchitecture and power modelsAssumptions and definitionsIrregular sub-network oriented multicast routingMotivation example and irregular sub-network oriented multicasting strategyHardware-based multicast routing algorithm for irregular sub-networksHardware costPerformance evaluationExperiment settingsReal applicationsRandom benchmarks with uniform trafficConclusionAcknowledgementReferencesOn an efﬁcient NoC multicasting scheme in support of multiple applicationsrunning on irregular sub-networksXiaohang Wanga,b, Mei Yangb, Yingtao Jiangb, Peng Liua,⇑aDepartment of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, PR ChinabDepartment of Electrical and Computer Engineering, University of Nevada, Las Vegas 89154, USAarticle infoArticle history:Available online 17 August 2010Keywords:Network-on-chips (NoCs)Chip multiprocessor (CMP)MulticastRoutingabstractWhen a number of applications simultaneously running on a many-core chip multiprocessor (CMP) chipconnected through network-on-chip (NoC), signiﬁcant amount of on-chip trafﬁc is one-to-many (multi-cast) in nature. As a matter of fact, when multiple applications are mapped onto an NoC architecture withapplicable trafﬁc isolation constraints, the corresponding sub-networks of these applications are mappedonto actually tend to be irregular. In the literature, multicasting for irregular topologies is supportedthrough either multiple unicasting or broadcasting, which, unfortunately, results in overly high powerconsumption and/or long network latency. To address this problem, a simple, yet efﬁcient hardware-based multicasting scheme is proposed in this paper. First, an irregular oriented multicast strategy is pro-posed. Literally, following this strategy, an irregular oriented multicast routing algorithm can be designedbased on any regular mesh based multicast routing algorithm. One such algorithm, namely, AlternativeRecursive Partitioning Multicasting (AL + RPM), is proposed based on RPM, which was designed for reg-ular mesh topology originally. The basic idea of AL + RPM is to ﬁnd the output directions following thebasic RPM algorithm and then decide to replicate the packets to the original output directions or thealternative (AL) output directions based on the shape of the sub-network. The experiment results showthat the proposed multicast AL + RPM algorithm can consume, on average, 14% and 20% less power thanbLBDR (a broadcasting-based routing algorithm) and the multiple unicast scheme, respectively. In addi-tion, AL + RPM has much lower network latency than the above two approaches. To incorporate AL + RPMinto a baseline router to support multicasting, the area overhead is fairly modest, less than 5.5%.Ó 2010 Elsevier B.V. All rights reserved.1. IntroductionAdvance in technology continues to drive the increase of tran-sistor integration capacity. It is estimated that by 2015, there willbe 100 billion transistors integrated on a 300 mm2die [1]. To ex-ploit this large number transistors and also take into considerationof pressing high power consumption of ever bigger chips, the de-sign paradigm is migrating to many-core architectures [1,2]. Net-work-on-chip (NoC) [3] has been proposed as the mainstreamon-chip network architecture to efﬁciently interconnect the largenumber of (16 or more) processing cores integrated on a many-core system. Some most recent, high proﬁle examples includeIntel’s Teraﬂop [4] and Tilera [5] chips featuring many-core chipmultiprocessors (CMPs) architectures with 2D mesh topologies[13] for on-chip interconnect.With the development of diverse applications and program-ming models on CMPs, one-to-many communication and one-to-all communication are becoming more common. For example, inCMPs with cache coherent shared memory systems, the cachecoherence protocols exhibit one-to-many communication charac-teristics to keep the ordering of different requests or to invalidateshared data on different cache nodes [6].In[7], it has been ob-served that 5–10% of the network trafﬁc is one-to-many in nature,ranging from scientiﬁc workloads to commercial workloads, incommunication traces of different cache coherence protocols andoperand network. Therefore, efﬁcient support of one-to-manycommunications in CMPs, particularly hardware multicast support,will beneﬁt a wide range of applications by boosting the networkperformance with reduced power consumption. Unfortunately,up to date, there is only very limited number of chip router designsthat actually support multicasting [6–8].In addition, the following issues make multicast supportingeven more complicated. The ﬁrst issue is topology irregularity.The large number of cores on a CMP unquestionably offers highparallelism in computation. To better utilize these vastly availablecomputation resources, virtualization of the chip becomes a0141-9331/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.micpro.2010.08.003⇑Corresponding author at: Department of Information Science and ElectronicEngineering, Zhejiang University, Hangzhou, Zhejiang 310027, PR China.E-mail addresses: [email protected] (X. Wang), [email protected](M. Yang), [email protected] (Y. Jiang), [email protected] (P. Liu).Microprocessors and Microsystems 35 (2011) 119–129Contents lists available at ScienceDirectMicroprocessors and Microsystemsjournal homepage: www.elsevier.com/locate/micpronecessity [9], where resources can be distributed among differentvirtual machines [3]. Applying virtualization [8] at the NoC levelbasically allows a single NoC-based CMP to be shared by multipleapplications with each mapped to different sub-networks of thechip [10] either statically [11] or dynamically [12]. Fig. 1 showsan example with three applications arriving at 1 ms, 2 ms, and3 ms. The three applications are allocated to three sub-networkswhich may not be regular shapes (e.g., 2D mesh, torus). On theother hand, virtualization requires trafﬁc isolation [8]; that is, com-munication between nodes in a virtualized region is limited to thesub-network only. The irregular sub-network and trafﬁc isolationrequirements together negate regular 2D mesh oriented routingalgorithms, like XY routing, odd–even routing, etc. [13].The second issue is unpredictability of the

View Full Document


School:
Email:
New Password:
Confirm Password:

UNLV ECG 702 - Microprocessors and Microsystems

Sign up for free to view:

Please select your school