Chico CSCI 693 - Single-ISA Heteregeneous Multi-Core Architectures

Unformatted text preview:

In Proceedings of the 31st International Symposium on Computer Architecture, June, 2004Single-ISA Heterogeneous Multi-Core Architecturesfor Multithreaded Workload PerformanceRakesh KumarÝ, Dean M. TullsenÝ, Parthasarathy RanganathanÞ, Norman P. JouppiÞ, Keith I. FarkasÞÝDepartment of Computer Science and EngineeringÞHP LabsUniversity of California, San Diego 1501 Page Mill RoadLa Jolla, CA 92093-0114 Palo Alto, CA 94304AbstractA single-ISA heterogeneous multi-core architecture is achip multiprocessor composed of cores of varying size, per-formance, and complexity. This paper demonstrates that thisarchitecture can provide significantly higher performance inthe same area than a conventionalchip multiprocessor. It doesso by matching the various jobs of a diverse workload to thevarious cores. This type of architecture covers a spectrum ofworkloads particularly well, providing high single-thread per-formance when thread parallelism is low, and high throughputwhen thread parallelism is high.This paper examines two such architectures in detail,demonstrating dynamic core assignment policies that pro-vide significant performance gains over naive assignment, andeven outperform the best static assignment. It examines poli-cies for heterogeneous architectures both with and withoutmultithreading cores. One heterogeneous architecture we ex-amine outperforms the comparable-area homogeneous archi-tecture by up to 63%, and our best core assignment strategyachieves up to 31% speedup over a naive policy.1 IntroductionThe design of a microprocessor that meets the needs of to-day’s multi-programmed compute environment must balancethe competing objectives of high throughput and good single-thread performance. To date, these objectives have been ad-dressed by adding features to monolithic superscalar proces-sors to increase throughput at the cost of increased complex-ity and design time. One such feature is simultaneous multi-threading [24, 23] (SMT). An alternative approach has been tobuild chip multiprocessors [8, 11] (CMPs) comprising multi-ple copies of increasingly complex cores.In this paper, we explore an alternate design point betweenthese two approaches, namely, CMPs comprising a hetero-geneous set of processor cores all of which can execute thesame ISA. The heterogeneity of the cores comes from differ-ences in their raw execution bandwidth (superscalar width),cache sizes, and other fundamental characteristics (e.g., in-order vs. out-of-order). This architecture has been proposedand evaluated in earlier work [13, 14] as a means to increas-ing the energy efficiency of single applications. However,as we demonstrate in this paper, the same architecture maybe used to deliver greater throughput and improved area effi-ciency (throughput per unit area) without significantly impact-ing single-thread performance.We evaluate a variety of heterogeneous architectural de-signs, including processor cores that are themselves mul-tithreaded, an extension to the original architecture pro-posal [14]. Through this evaluation, we make the followingtwo contributions.First, we demonstrate that this approach can provide signif-icant performance advantages for a multiprogrammed work-load over homogeneous chip-multiprocessors. We show thatthis advantage is realized for two reasons. First, a heteroge-neous multi-core architecture has the ability to match each ap-plication to the core best suited to meet its performance de-mands. Second, it can provide better area-efficient coverageof the whole spectrum of workload demands that may be seenin a real machine, from low thread-level parallelism (provid-ing low latency for few applications on powerfulcores) to highthread-level parallelism (where a large number of applicationscan be hosted at once on simple cores).Overall, our representative heterogeneous processor usingtwo core types achieves as much as 63% performance im-provement over an equivalent-area homogeneous processor.Over a range of moderate load levels (e.g., 5-8 threads), wesee an average gain of 29%. For an open system with randomjob arrivals, the heterogeneous architecture has much loweraverage response time over a range of job arrival rates and re-mains stable for arrival rates 43% higher than that for which ahomogeneous architecture breaks down.Our second contribution is to demonstrate dynamic thread-to-core assignment policies that realize most of the potentialperformance gain. These policies significantly outperform arandom schedule, and even beat the best static assignment (us-ing hindsight) of jobs to cores. These heuristics match the di-versity of the workload resource requirements to the cores bychanging the workload-to-core mapping either periodically orin response to triggering events. We study the design space ofjob assignment policies, examining sampling frequency andduration, and how core assignment is made. Our best policyoutperforms naive core assignment by 31%.We also study the application of these mechanisms to thecores in a heterogeneous processor that includes multithreadedcores. Despite the additional scheduling complexity posed bythe simultaneous multithreading cores (due to an explosion inthe possible assignment permutations), we demonstrate the ex-istence of effective assignment policies. With these policies,this architecture provides even better coverage of a spectrumof load levels. It provides both the low latency of powerfulprocessors at low threading levels, but is also comparable to alarge array of small processors at high thread occupancy.The rest of the paper is organized as follows. Section 2motivates heterogeneous design for performance. Section 3describes our measurement methodology. Section 4 dis-cusses the performance benefits from our architecture and ourscheduling approaches to solve the new design issues associ-ated with these architectures. Section 5 concludes.2 Architecture and BackgroundThis section illustrates the potential benefits from archi-tectural heterogeneity, introduces issues in using core het-erogeneity with multi-programmed workloads, and discussesprior related work.2.1 Exploring the potential from heterogeneityThe advantages of heterogeneous architectures stem fromtwo sources. The first advantage results from more efficientadaptation to application diversity. Applications (or differentphases of a single application) place different demands on dif-ferent architectures, stemming from the nature of the compu-tation [14]. While some applications take


View Full Document

Chico CSCI 693 - Single-ISA Heteregeneous Multi-Core Architectures

Download Single-ISA Heteregeneous Multi-Core Architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Single-ISA Heteregeneous Multi-Core Architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Single-ISA Heteregeneous Multi-Core Architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?