SMU CSE 8383 - Single-ISA Heterogeneous Multi-Core Architectures

Unformatted text preview:

Slide 1What is Multi-Core Architectures?General IdeaArchitectureCores / Processors targetedClose lookMotivationCoresCores, cont.SimulationCore SwitchingSwitching Algorithms: Oracle based dynamic switching using energy heuristicSwitching Algorithms: Oracle based dynamic switching using energy-delay heuristicSwitching Algorithms: Static Core SelectionOracle based dynamic switching using energy heuristicOracle based dynamic switching using energy-delay heuristicOracle static selection based on energy heuristicSwitching Algorithms: Realistic Dynamic SwitchingRealistic Dynamic Switching ResultsRelated work power-related optimizations for processor design can be classified into two categoriesFinal wordsConclusionConcerns1Rakesh Kumar,Keith I. Farkas,Norman P. Jouppi,Parthasarathy Ranganathan,Dean M. TullsenProceedings of the 36th International Symposium on Microarchitecture (MICRO-36’03)Single-ISA HeterogeneousMulti-Core Architectures:The Potential for Processor Power ReductionAdvanced computer Advanced computer architecture architecture CSE 8383CSE 83832What is Multi-Core Architectures?a multi-core processor delivers two or more complete execution units - or cores - in a single, physical processor. all cores run at the same frequency, and are plugged into a single processor socket. they also share the same platform interface, which connects them to memory, I/O and storage resources.3General Idea This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation.Main Point : gathering heterogeneous architectures on a die . for an application; choose the most power efficient processor given some performance constraints  save power4Architecture The architecture consists of a chip-level multiprocessor with multiple, diverse processor cores. These cores all execute the same instruction set, but include significantly different resources and achieve different performance and energy efficiency on the same application.the operating system software tries to match the application to the different cores, attempting to meet a defined objective function. For example, it may be trying to meet a particular performance requirement or goal, but doing so with maximum energy efficiency.5Cores / Processors targeted This work examines a diverse set of execution cores. In a processor where the objective function is static (and perhaps the workload is well known), some of the results indicate that a smaller set of cores (often two) will be sufficient to achieve very significant gains. However, if the objective function varies over time or workload, a larger set of cores has even greater benefit.6Close lookFour cores: Alpha EV4, EV5, EV6, and EV8Each core has different power/performance characteristicsDuring execution, software dynamically chooses the core that best meets the power and performance needsOnly one core and one thread is running at any given time The goal is not performance increase, but power usage decrease7MotivationBy 2015 processors will consume 300WExisting CMP designs use only homogeneous coresApplications with high ILP can be exploited on wider cores (e.g. EV8) but applications with low ILP use less power on narrower cores (e.g. EV4) with little loss in performanceNo need to design cores from scratch because existing Alpha cores run on practically the same ISA8CoresEV4: Alpha 21064EV5: Alpha 21164EV6: Alpha 21264EV8-: single-threaded version of Alpha 21464 (based on “projected numbers”)9Cores, cont. All cores share an on-chip 3.5 MB 7-way set associative L2 cache (latencies were calculated using CACTI)ISA differences solved by. Either programs are compiled to the least common denominator (the EV4), or we use software traps for the older cores.2.2 2.3Assuming all cores are implemented in 0.10 micron technologyWe assume the four cores have private L1 data and instruction caches and share a common L2 cache, phase-lock loop circuitry, and pins.All cores run at 2.1GHz (the frequency at which an EV6 core would run if its 600MHz, 0.35 micron implementation was scaled to 0.10 micron)10SimulationWattch was used to simulate power usage, but had to be calibrated with scaling and offset factors to compare older technologies alongside newer technologiesCACTI was used to simulate L2 power consumption14 SPEC2000 benchmarks were run: 7 integer and 7 floating pointBenchmarks are simulated using SMTSIM in non-multithreading modeSince several assumptions were made based on common rules-of-thumb used in typical processor design, several sensitivity experiments with widely different assumptions about the range of power dissipation in the core were performed. From these experiments, it was clear that power differences between cores dominates any power differences between applications on the same core11Core SwitchingSwitching done at the operating system levelTwo options for switching granularity:–Granularity of application–Granularity of operating system timeslice intervalsOS switch involves cache flush and saving and loading user states for the coresUnused cores are completely powered down (therefore no leakage)Estimate that a core can be powered up in ~1000 cycles at 2.1 GHzSwitching overhead turns out to be negligible (~1%)2.3 312Switching Algorithms:Oracle based dynamic switching using energy heuristicWith oracle knowledge of power requirements and performance potential, chose the core that would have the lowest energy consumption, as long as it performs within 10% of EV8-Average energy reduction = 38%Average performance degradation = 4%appluResults table13Switching Algorithms:Oracle based dynamic switching using energy-delay heuristicWith oracle knowledge of power and performance, chose the core that would maximize IPS2/Watt, as long as it performs within 50% of EV8-Average energy reductions = 73%Average energy-delay reduction = 63%Average performance degradation = 22%applu14Switching Algorithms:Static Core SelectionChose a single core to run for the duration of execution, perhaps based on compiler analysis, profiling, past history, or simple samplingbased on energy heuristic (performance constraint within 10% of EV8-)–Average energy savings = 32%.–Average performance degradation = 2.6%Based on energy-delay heuristic–Average energy-delay savings = 31%–Average


View Full Document

SMU CSE 8383 - Single-ISA Heterogeneous Multi-Core Architectures

Download Single-ISA Heterogeneous Multi-Core Architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Single-ISA Heterogeneous Multi-Core Architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Single-ISA Heterogeneous Multi-Core Architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?