A Compiler-in-the-Loop (CIL) Framework to Explore Horizontally Partitioned Cache (HPC) ArchitecturesPower in Embedded SystemsMemory SubsystemHorizontally Partitioned Cache (HPC)Performance Advantage of HPCPower Advantage of HPCsHPC Design ComplexityHPC Design Space ExplorationRelated WorkHPC Exploration FrameworkSlide 11ExperimentsImportance of HPC DSESlide 14Importance of Compiler-in-the-Loop DSESlide 16Design Space Exploration HeuristicsAchieved Energy ReductionExploration timeSummaryA Compiler-in-the-Loop (CIL) Framework A Compiler-in-the-Loop (CIL) Framework to Explore to Explore Horizontally Partitioned Cache (HPC) Horizontally Partitioned Cache (HPC) ArchitecturesArchitecturesAviral Shrivastava*, Ilya Issenin, Nikil Dutt*Compiler and Microarchitecture Lab,Center for Embedded Systems,Arizona State University, Tempe, AZ, USA.CCMMLLACES Lab,Center For Embedded Computer Systems,University of California, Irvine, CA, USACopyright © 2008 ASUASP-DAC 2008CCMMLL2Power in Embedded SystemsPower in Embedded SystemsPower: Most important factor in usability of electronic devicesPower: Most important factor in usability of electronic devicesDevice Battery life Charge timeBattery weight/ Device weightApple iPOD 2-3 hrs 4 hrs 3.2/4.8 ozPanasonic DVD-LX9 1.5-2.5 hrs 2 hrs 0.72/2.6 poundsNokia N80 20 mins 1-2 hrs 1.6/4.73 ozPerformance requirements of handheldsIncrease by 30X in a decadeBattery capacityIncrease by 3X in a decadeConsidering technological breakthroughs, e.g. fuel cellsCopyright © 2008 ASUASP-DAC 2008CCMMLLMemory SubsystemMemory SubsystemEmbedded System DesignEmbedded System DesignMinimize power at minimal performance lossMemory subsystem design parametersMemory subsystem design parametersSignificant impact on power and performanceMay be the major consumer of system powerMay be the major consumer of system powerVery significant impact on performanceVery significant impact on performanceNeed to be chosen very carefullyCompiler Compiler influences influences the way application uses the way application uses memorymemoryCompiler should take part in the design process3Compiler-in-the-Loop Memory DesignCopyright © 2008 ASUASP-DAC 2008CCMMLL4Horizontally Partitioned Cache Horizontally Partitioned Cache (HPC)(HPC)Originally proposed by Gonzalez et al. Originally proposed by Gonzalez et al. in 1995in 1995More than one cache at the same level More than one cache at the same level of memory hierarchyof memory hierarchyCaches share the interface to memory Caches share the interface to memory and processorand processorEach page is mapped to exactly one Each page is mapped to exactly one cachecacheMapping is done at page-level Mapping is done at page-level granularitygranularitySpecified as page attributes in MMUSpecified as page attributes in MMUMini Cache is relatively smallMini Cache is relatively smallExample: Intel StrongARM and XScaleExample: Intel StrongARM and XScaleProcessor PipelineMain CacheMini CacheMemoryCopyright © 2008 ASUASP-DAC 2008CCMMLL5Performance Advantage of HPCPerformance Advantage of HPCObservation: Often arrays have low Observation: Often arrays have low temporal localitytemporal localityImage copying: each value is used only once or a few timesBut the stream evicts all other data from the cacheSeparate low temporal locality data Separate low temporal locality data from high temporal locality datafrom high temporal locality dataArray a – low temporal locality – small (mini) cacheArray b – high temporal locality – regular (main) cachePerformance ImprovementPerformance ImprovementReduced miss rate of Array bTwo separate caches may be better than a unified cache of the total sizeProcessor Pipelinea[1000]b[5]Memorychar a[1024];char b[1024];for (int i=0; i<1024; i++) c += a[i]+b[i%5];Copyright © 2008 ASUASP-DAC 2008CCMMLL6Power Advantage of HPCsPower Advantage of HPCsPower savings due to two effectsPower savings due to two effectsReduction in miss rateAccessEnergy(mini cache) < AccessEnergy(main cache)Reduction in miss rateReduction in miss rateAligned with performanceExploited by performance improvement techniquesLess Energy per Access to mini cacheLess Energy per Access to mini cacheInverse to performanceEnergy can decrease even if there are more missesEnergy can decrease even if there are more missesOpposite to performance optimization techniquesCompiler (Data Partitioning) Techniques for Compiler (Data Partitioning) Techniques for performance improvement and power reduction performance improvement and power reduction are differentare differentCopyright © 2008 ASUASP-DAC 2008CCMMLL7HPC Design ComplexityHPC Design ComplexityPower reduction very sensitive on data partitionPower reduction very sensitive on data partitionUp to 2x difference in power consumptionPower reduction achieved is also very sensitive on Power reduction achieved is also very sensitive on the HPC design parameters, e.g., size, associativitythe HPC design parameters, e.g., size, associativityUp to 4x difference in power consumptionHPC DesignHPC ParametersChooseData PartitionApplicationData PartitionChooseHPC ParametersCopyright © 2008 ASUASP-DAC 2008CCMMLLJan 16, 2019Aviral Shrivastava Final Defense8HPC Design Space ExplorationHPC Design Space ExplorationTraditional ExplorationApplicationApplicationHPC ParametersCompilerExecutableCycle Accurate SimulatorCycle Accurate SimulatorSensitive CompilerExecutableCycle AccurateSimulatorCycle AccurateSimulatorCompiler-in-the-Loop ExplorationCompiler-in-the-Loop (CIL) Design Space Exploration (DSE)Compiler-in-the-Loop (CIL) Design Space Exploration (DSE)SynthesizeBest processor ConfigurationCopyright © 2008 ASUASP-DAC 2008CCMMLL9Related WorkRelated WorkHorizontally Partitioned CachesHorizontally Partitioned CachesIntel StrongARM SA 1100, Intel XScalePerformancePerformance-oriented data partitioning techniques for HPC-oriented data partitioning techniques for HPCNo Analysis (Region-based Partitioning)Separate array and stack variablesSeparate array and stack variablesGonzalez et al. [Gonzalez et al. [ICS’95ICS’95], Lee et al. [], Lee et al. [CASES’00CASES’00], Unsal et al. [], Unsal et al. [HPCA’02HPCA’02]]Dynamic Analysis (in hardware)Memory address; PC basedMemory address; PC basedJohnson et al. [Johnson et al. [ISCA’97ISCA’97], Rivers et al. [], Rivers
or
We will never post anything without your permission.
Don't have an account? Sign up