SMU CSE 8383 - Power-Performance Considerations of Parallel Computing on Chip Multiprocessors

Unformatted text preview:

Power-Performance Considerations ofParallel Computing on Chip MultiprocessorsJIAN LI and JOS´E F. MART´INEZCornell UniversityThis paper looks at the power-performance implications of running parallel applications on chipmultiprocessors (CMPs). First, we develop an analytical model that, for the first time, puts to-gether parallel efficiency, granularity of parallelism, and voltage/frequency scaling, to establish aformal connection with the power consumption and performance of a parallel code running on aCMP. We then conduct detailed simulations of parallel applications running on a detailed power-performance CMP model to confirm the analytical results and provide further insights. Both an-alytical and experimental models show that parallel computing can bring significant power sav-ings and still meet a given performance target by choosing granularity and voltage/frequencylevels judiciously. The particular choice, however, is dependent on the application’s parallel effi-ciency curve and the process technology utilized, which our model captures. Likewise, analyticalmodel and experiments show the effect of a limited power budget on the application’s scalabilitycurve. In particular, we show that a limited power budget can cause a rapid performance degra-dation beyond a number of cores, even in the case of applications with excellent scalability prop-erties. On the other hand, our experiments show that, when a limited power budget is in place,power-thrifty memory-bound applications may actually enjoy better scalability than more compute-intensive codes, even if the latter would exhibit higher scalability in a power-unconstrainedscenario.Categories and Subject Descriptors: C.1.4 [Processor Architectures]: Parallel ArchitecturesGeneral Terms: Power, Performance, Parallel Computation, Chip Multiprocessors, TheoryAdditional Key Words and Phrases: Voltage/frequency scaling, granularity, parallel efficiency1. INTRODUCTIONLow-power computing has long been an important design objective formobile, battery-operated devices. More recently, however, power consumptionin high-performance microprocessors has drawn considerable attention fromindustry and researchers as well. Traditionally, power dissipation in CMOStechnology has been significantly lower than other technologies, such as TTL orECL. However, at current speeds and feature sizes, CMOS power consumptionAuthors’ address: Jian Li and Jos´e F. Mart´ınez, Computer Systems Laboratory, Cornell University,Ithaca, NY 14853; email: {li,martinez}csl.cornell.edu.Permission to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for profit or direct commercialadvantage and that copies show this notice on the first page or initial screen of a display alongwith the full citation. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,to redistribute to lists, or to use any component of this work in other works requires prior specificpermission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]2005 ACM 1544-3566/05/1200-0397 $5.00ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 4, December 2005, Pages 397–422.398•J. Li and J. F. Mart´ınezhas increased dramatically. This makes microprocessor cooling increasinglydifficult and expensive [Borkar 1999; Gunther et al. 2001]. As a result, overthe last few years, power has become a first-priority concern to microprocessordesigners/manufacturers [Agerwala and Chatterjee 2005; Weiser 2004].In light of this mounting problem, industry and researchers are eyeing chipmultiprocessor architectures (CMPs), which can attain higher performance byrunning multiple threads in parallel. By integrating multiple cores on a chip,designers hope to deliver performance growth while depending less on rawcircuit speed and, thus, power [Agerwala and Chatterjee 2005].Earlier VLSI works have discussed the trade-offs that sequential vs. parallelcircuits present in silicon area and power consumption [Chandrakasan et al.1992; Parhi 1999]. There is also rich literature on power/thermal-awaresimultaneous multithreading (SMT) and CMP designs (or similar architectureconfigurations), most of which focuses on multiprogrammed workloads [Donaldand Martonosi 2004; Ghiasi and Grunwald 2004; Kumar et al. 2003; Li et al.2004; Sasanka et al. 2004; Seng et al. 2000]. So far, however, very little workhas been done on the power-performance issues involving parallel applica-tions executing on multiprocessors, in general and on multicore chips, inparticular.In a parallel run, processors synchronize and exchange data as they coop-erate toward a common goal. Synchronization and communication constituteoverheads that typically grow in importance as we increase the number ofprocessors. This generally results in decreased parallel efficiency (speedupover number of processors used). (On the other hand, at low processor counts,the beneficial effect of the increased aggregate caching capacity over a singleprocessor may yield a net increase in parallel efficiency—superlinear speedup.)As a result of the application’s changing parallel efficiency, it is not obviouswhich voltage and frequency levels should be applied, in combination with theappropriate number of processors, to optimize a certain power-performancetrade-off and/or meet a particular constraint. Overall, the general connectiverole that the application’s parallel efficiency plays across processors in a par-allel execution is not present in a multiprogrammed context, where processorsoperate largely independently of each other.In this paper, we investigate the power-performance issues of runningparallel applications on a CMP. First, we develop an analytical model tostudy the effect of the number of processors used, the parallel efficiency,and the voltage/frequency scaling applied, on the performance and powerconsumption delivered by a CMP. Specifically, we look at (1) optimizingpower consumption given a performance target, (2) optimizing performancegiven a certain power budget, and (3) optimizing energy-delay product. Then,to confirm the insights developed from the analytical model and providefurther insights, we conduct detailed


View Full Document

SMU CSE 8383 - Power-Performance Considerations of Parallel Computing on Chip Multiprocessors

Download Power-Performance Considerations of Parallel Computing on Chip Multiprocessors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Power-Performance Considerations of Parallel Computing on Chip Multiprocessors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Power-Performance Considerations of Parallel Computing on Chip Multiprocessors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?