Chico CSCI 693 - Experimental Evaluation of Emerging Multi-core Architectures

Unformatted text preview:

Experimental Evaluation of Emerging Multi-core Architectures Abdullah Kayi 1, Yiyi Yao 1, Tarek El-Ghazawi 1, Greg Newby 2 1The George Washington University Dept. of Electrical and Computer Engineering Washington, DC 20052 USA {apokayi, yyy, tarek}@gwu.edu 2Arctic Region Supercomputing Center Fairbanks, AK 99775 USA Abstract The trend of increasing speed and complexity in the single-core processor as stated in the Moore’s law is facing practical challenges. As a result, the multi-core processor architecture has emerged as the dominant architecture for both desktop and high-performance systems. Multi-core systems introduce many challenges that need to be addressed to achieve the best performance. Therefore, a new set of benchmarking techniques to study the impacts of the multi-core technologies is necessary. In this paper, multi-core specific performance metrics for cache coherency and memory bandwidth/latency/contention are investigated. This study also proposes a new benchmarking suite which includes cases extended from the High Performance Computing Challenge (HPCC) benchmark suite. Performance results are measured on a Sun Fire T1000 server with six cores and an AMD Opteron dual core system. Experimental analysis and observations in this paper provide for a better understanding of the emerging multi-core architectures. 1. Introduction The emerging multi-core architectures provide a solution to increase the performance capability on a single chip without requiring a complex system and increasing the power requirements [1, 2, 3, 4]. However, these architectures have introduced many challenges in _______________________ 1-4244-0910-1/07/$20.00 ©2007 IEEE. maximizing application performance. Thus, benchmarking of the multi-core architectures becomes very important to unveil the potentials of these systems. Most existing benchmarks are not targeting these multi-core architectures and thus are not able to exploit the multi-core specific low level features such as shared cache coherence overhead, memory resource contention and etc. To address this problem, this paper first proposes a set of multi-core specific performance metrics to be investigated in this research study. In addition, all the experiments focus on the sources of possible bottlenecks in these multi-core architectures to be able to evaluate the potentials of these systems. Some synthetic benchmarking cases, an extension from HPCC benchmarking suites [5] (STREAM [6] and RandomAccess [7]) and an FFTW [8] multi-threaded application featuring the proposed performance metrics are provided. Such benchmarking cases and application are applied to the UltraSPARC T1 processor [9] and the AMD Opteron processors [10]. The experimental methodology is explained in Section 2 whereas Section 3 shows the results and observations obtained during this study. And finally, Section 4 includes the conclusion and the future remarks. 2. Experimental Methodology At the beginning of this research study, a set of multi-core specific performance metrics are identified to guide the benchmarking of multi-core architectures as illustrated in Table 1. Such metrics focus on the features that are different from previous single core architectures and exploit the potential sources of inefficiencies in theTable 1. Multi-core specific benchmarking metrics multi-core architectures. As shown in Table 1, the performance metrics are developed in layers targeting the architectural aspects of multi-core systems. The Cache and the Memory are listed as the sources of performance inefficiency that we expect to affect a multi-core system performance. Within each of these sources, there are some aspects that are multi-core specific and these are listed in the second column of Table 1. Cache coherency among the cores is the aspect we identified to be the most important for the overall systems performance from cache architectural point of view mainly for the shared cache multi-core architectures. From the memory side, bandwidth sustained from the cores to the main memory, latency observed from the cores to the main memory and resource contention among the cores while accessing the main memory are selected as important aspects to be examined in this study. According to these aspects performance metrics are determined as the experimental goals of this study. For each of these metrics, we developed a synthetic benchmarking case to measure the potential performance or overhead to get a better understanding of the corresponding aspect in the experimented multi-core architectures. In order to convey the experiments, UltraSPARCT1 (6 cores) and AMD Opteron single/dual core processors are utilized with Solaris and Linux operating systems respectively. To fully exploit the multi-core architectures, all the benchmarking cases were implemented using the POSIX [11] thread library if necessary. In addition, for both systems we used the gcc compiler. In order to interpret the results better, following sub-sections will describe the benchmarking schemes and methodology for each and every aspect stated earlier. 2.1 Cache Aspects Different multi-core architectures use different ways of caching and cache sharing among the cores [12, 13, 14, 15]. For instance, each UltraSPARC T1 processor has a twelve-way associative (four banks) unified Level 2 (L2) on-chip cache, and each cache hardware strand shares the entire L2 cache [9, 16]. Besides this, each UltraSPARC T1 processor core has its own L1 instruction cache, L1 data cache, instruction TLB and data TLB. Thus, cache coherency effect is very important for such systems to be examined in detail. On the other hand, the AMD Opteron processor cores share neither the L1 cache nor the L2 cache [17]. However, cache coherency is still an important effect to be considered for these multi-core systems. In our benchmarking suite, we created cache trashing cases and accordingly tried to measure the cache coherency overhead in these scenarios. Further information will be given in Section 3 for this experiment with the results. 2.2 Memory aspects It is quite interesting and important to examine how more than one core on a single chip will affect the overall system performance while accessing the main memory [18]. It is crucial to discover the challenges and limits of these multi-core systems for the earlier stated memory aspects. In order to examine the memory subsystem of these multi-core systems, we adapted two of the


View Full Document

Chico CSCI 693 - Experimental Evaluation of Emerging Multi-core Architectures

Download Experimental Evaluation of Emerging Multi-core Architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Experimental Evaluation of Emerging Multi-core Architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Experimental Evaluation of Emerging Multi-core Architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?