New version page

Thread Motion

This preview shows page 1-2-3-4 out of 12 pages.

View Full Document
View Full Document

End of preview. Want to read all 12 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Thread Motion: Fine-Grained Power Management forMulti-Core SystemsKrishna K. Rangan†‡ Gu-Yeon Wei† David Brooks††Harvard University33 Oxford St., Cambridge, MA 02138{kkrangan, guyeon, dbrooks}@eecs.harvard.edu‡Intel Massachusetts77 Reed Road, Hudson, MA 01749{krishna.rangan}@intel.comABSTRACTDynamic voltage and frequency scaling (DVFS) is a commonly-used power-management scheme that dynamically adjusts power and performance tothe time-varying needs of running programs. Unfortunately, conventionalDVFS, relying on off-chip regulators, faces limitations in terms of temporalgranularity and high costs when considered for future multi-core systems.To overcome these challenges, this paper presents thread motion (TM), afine-grained power-management scheme for chip multiprocessors (CMPs).Instead of incurring the high cost of changing the voltage and frequency ofdifferent cores, TM enables rapid movement of threads to adapt the time-varying computing needs of running applications to a mixture of cores withfixed but different power/performance levels. Results show that for the samepower budget, two voltage/frequency levels are sufficient to provide perfor-mance gains commensurate to idealized scenarios using per-core voltagecontrol. Thread motion extends workload-based power management intothe nanosecond realm and, for a given power budget, provides up to 20%better performance than coarse-grained DVFS.Categories and Subject DescriptorsC.1.4 [Processor Architectures]: Parallel Architectures—Dis-tributed architecturesGeneral TermsPerformance, Design1. INTRODUCTIONPower dissipation continues to be a primary design constraint inthe multi-core chip era. Increasing power consumption not only re-sults in increasing energy costs, but also results in high die temper-atures that affect chip reliability, performance, and packaging cost.From the performance standpoint, current and future multi-coresystems will have to carefully constrain application performanceto stay within power envelopes. For example, power constraints re-sult in reduced per-core throughput when multiple cores are activein current Intel processors [2]. Fortunately, multi-core systems hostapplications that exhibit runtime variability in their performance re-quirements, which can be exploited to optimize throughput whilestaying within the system-power envelope.Dynamic voltage and frequency scaling (DVFS) schemes seek toexploit runtime variability in application behavior to achieve maxi-Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ISCA’09, June 20–24, 2009, Austin, Texas, USACopyright 2009 ACM 978-1-60558-526-0/09/06 ...$5.00.mum energy savings with minimal performance degradation. How-ever, traditional DVFS scaling, which is initiated by the operatingsystem (OS), has two primary drawbacks: First, OS scheduler sam-pling intervals are on the millisecond time scale, while computa-tional requirements can vary on the nanosecond time scale due toevents such as cache misses. Hence, OS-driven DVFS is too slowto respond to such fine variations in program behavior. Second,multi-core systems execute multiple applications with potentiallyvery different computational needs. Even though the performanceadvantages of per-core DVFS in multi-core systems have been sug-gested [11, 15], providing per-core, independent voltage control inchips with more than two cores can be expensive [15]. Moreover,when DVFS is applied across multiple cores, determining a singleoptimal DVFS setting that simultaneously satisfies all cores will beextremely difficult; some applications will suffer performance lossor power overheads. This problem worsens as the number of coresand running applications increase in future systems.Clearly, a fast-acting, yet cost-effective mechanism to obtain thebenefits of per-core DVFS on systems with a large number of coresis desirable. Trends in current multi-core systems suggest: (1) Eventhough per-core, independent voltage control is currently impracti-cal, future systems with a multitude of cores can be expected tohave a small number of independent voltage and frequency do-mains [1, 3]. As such, cores that differ in power-performance ca-pabilities will exist. (2) Future high-throughput systems are likelyto pack together a large number of simple cores [23,25,27] hostingmany more applications. Unfortunately, these trends further exac-erbate the problems of using conventional DVFS. To address theselimitations, we propose a fast, fine-grained power-management ap-proach that we call thread motion (TM).Thread motion is a power-management technique that enablesapplications to migrate between cores in a multi-core system withsimple, homogeneous cores but heterogeneous power-performancecapabilities. For example, envision a homogeneous multi-core sys-tem where cores differ only in terms of their operating frequencyand voltage. Such power-performance heterogeneity offers a wayto accommodate a wide range of power envelope levels withoutlimiting the performance of all of the cores together. Instead, itoffers a mixture of performance capabilities with a small num-ber of static voltage/frequency (VF) domains. As applications runon these cores, TM enables applications to migrate to cores withhigher or lower VF settings depending on a program’s time-varyingcompute intensity. If one application could benefit from higher VFwhile another is stalled on a cache miss, a swap of these two appli-cations between cores of different power capabilities may provideoverall improvements in power-performance efficiency. Comparedto slow transition times of conventional regulator-based DVFSschemes, thread motion can be applied at much finer time intervalsABXY(Return from cache miss)timeApp A(Cache Miss)High-VFActivityApp BLow-VFtimeActivityApp AApp BHigh-VF(c)Low-VFLow IPCHigh IPC(b)(a)Figure 1: (a) Illustration of thread motion in a multi-core system. (b) Exploiting fine-grained application variability in two running threads. (c) Duty cyclingbetween 2 VF levels to match application IPC.and applied more often. Another potential benefit of rapidly mov-ing applications between cores is


Loading Unlocking...
Login

Join to view Thread Motion and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Thread Motion and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?