Thread Motion

Home> Academic Documents> Thread Motion

DOC PREVIEW

Thread Motion

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Thread Motion: Fine-Grained Power Management forMulti-Core SystemsKrishna K. Rangan†‡ Gu-Yeon Wei† David Brooks††Harvard University33 Oxford St., Cambridge, MA 02138{kkrangan, guyeon, dbrooks}@eecs.harvard.edu‡Intel Massachusetts77 Reed Road, Hudson, MA 01749{krishna.rangan}@intel.comABSTRACTDynamic voltage and frequency scaling (DVFS) is a commonly-used power-management scheme that dynamically adjusts power and performance tothe time-varying needs of running programs. Unfortunately, conventionalDVFS, relying on off-chip regulators, faces limitations in terms of temporalgranularity and high costs when considered for future multi-core systems.To overcome these challenges, this paper presents thread motion (TM), aﬁne-grained power-management scheme for chip multiprocessors (CMPs).Instead of incurring the high cost of changing the voltage and frequency ofdifferent cores, TM enables rapid movement of threads to adapt the time-varying computing needs of running applications to a mixture of cores withﬁxed but different power/performance levels. Results show that for the samepower budget, two voltage/frequency levels are sufﬁcient to provide perfor-mance gains commensurate to idealized scenarios using per-core voltagecontrol. Thread motion extends workload-based power management intothe nanosecond realm and, for a given power budget, provides up to 20%better performance than coarse-grained DVFS.Categories and Subject DescriptorsC.1.4 [Processor Architectures]: Parallel Architectures—Dis-tributed architecturesGeneral TermsPerformance, Design1. INTRODUCTIONPower dissipation continues to be a primary design constraint inthe multi-core chip era. Increasing power consumption not only re-sults in increasing energy costs, but also results in high die temper-atures that affect chip reliability, performance, and packaging cost.From the performance standpoint, current and future multi-coresystems will have to carefully constrain application performanceto stay within power envelopes. For example, power constraints re-sult in reduced per-core throughput when multiple cores are activein current Intel processors [2]. Fortunately, multi-core systems hostapplications that exhibit runtime variability in their performance re-quirements, which can be exploited to optimize throughput whilestaying within the system-power envelope.Dynamic voltage and frequency scaling (DVFS) schemes seek toexploit runtime variability in application behavior to achieve maxi-Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for proﬁt or commercial advantage and that copiesbear this notice and the full citation on the ﬁrst page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior speciﬁcpermission and/or a fee.ISCA’09, June 20–24, 2009, Austin, Texas, USACopyright 2009 ACM 978-1-60558-526-0/09/06 ...$5.00.mum energy savings with minimal performance degradation. How-ever, traditional DVFS scaling, which is initiated by the operatingsystem (OS), has two primary drawbacks: First, OS scheduler sam-pling intervals are on the millisecond time scale, while computa-tional requirements can vary on the nanosecond time scale due toevents such as cache misses. Hence, OS-driven DVFS is too slowto respond to such ﬁne variations in program behavior. Second,multi-core systems execute multiple applications with potentiallyvery different computational needs. Even though the performanceadvantages of per-core DVFS in multi-core systems have been sug-gested [11, 15], providing per-core, independent voltage control inchips with more than two cores can be expensive [15]. Moreover,when DVFS is applied across multiple cores, determining a singleoptimal DVFS setting that simultaneously satisﬁes all cores will beextremely difﬁcult; some applications will suffer performance lossor power overheads. This problem worsens as the number of coresand running applications increase in future systems.Clearly, a fast-acting, yet cost-effective mechanism to obtain thebeneﬁts of per-core DVFS on systems with a large number of coresis desirable. Trends in current multi-core systems suggest: (1) Eventhough per-core, independent voltage control is currently impracti-cal, future systems with a multitude of cores can be expected tohave a small number of independent voltage and frequency do-mains [1, 3]. As such, cores that differ in power-performance ca-pabilities will exist. (2) Future high-throughput systems are likelyto pack together a large number of simple cores [23,25,27] hostingmany more applications. Unfortunately, these trends further exac-erbate the problems of using conventional DVFS. To address theselimitations, we propose a fast, ﬁne-grained power-management ap-proach that we call thread motion (TM).Thread motion is a power-management technique that enablesapplications to migrate between cores in a multi-core system withsimple, homogeneous cores but heterogeneous power-performancecapabilities. For example, envision a homogeneous multi-core sys-tem where cores differ only in terms of their operating frequencyand voltage. Such power-performance heterogeneity offers a wayto accommodate a wide range of power envelope levels withoutlimiting the performance of all of the cores together. Instead, itoffers a mixture of performance capabilities with a small num-ber of static voltage/frequency (VF) domains. As applications runon these cores, TM enables applications to migrate to cores withhigher or lower VF settings depending on a program’s time-varyingcompute intensity. If one application could beneﬁt from higher VFwhile another is stalled on a cache miss, a swap of these two appli-cations between cores of different power capabilities may provideoverall improvements in power-performance efﬁciency. Comparedto slow transition times of conventional regulator-based DVFSschemes, thread motion can be applied at much ﬁner time intervalsABXY(Return from cache miss)timeApp A(Cache Miss)High-VFActivityApp BLow-VFtimeActivityApp AApp BHigh-VF(c)Low-VFLow IPCHigh IPC(b)(a)Figure 1: (a) Illustration of thread motion in a multi-core system. (b) Exploiting ﬁne-grained application variability in two running threads. (c) Duty cyclingbetween 2 VF levels to match application IPC.and applied more often. Another potential beneﬁt of rapidly mov-ing applications between cores is


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

Please select your school