DOC PREVIEW
CMU CS 15745 - Post-Pass Binary Adaptation for Software-Based Speculative Precomputation

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Post-Pass Binary Adaptation for Software-Based Speculative Precomputation Steve S.W. Liao, Perry H. Wang, Hong Wang, Gerolf Hoflehnerg, Daniel Laveryg, John P. Shen Microprocessor Research Intel Labs Intel Compilerg Software and Solutions Group{shih-wei.liao, perry.wang, hong.wang, gerolf.f.hoflehner, daniel.m.lavery, john.shen}@intel.com ABSTRACT Recently, a number of thread-based prefetching techniques have been proposed. These techniques aim at improving the latency of single-threaded applications by leveraging multithreading resources to perform memory prefetching via speculative prefetch threads. Software-based speculative precomputation (SSP) is one such technique, proposed for multithreaded Itanium models. SSP does not require expensive hardware support—instead it relies on the compiler to adapt binaries to perform prefetching on otherwise idle hardware thread contexts at run time. This paper presents a post-pass compilation tool for generating SSP-enhanced binaries. The tool is able to: (1) analyze a single-threaded application to generate prefetch threads; (2) identify and embed trigger points in the original binary; and (3) produce a new binary that has the prefetch threads attached. The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread. Our results indicate that for a set of pointer-intensive benchmarks, the prefetching performed by the speculative threads achieves an average of 87% speedup on an in-order processor and 5% speedup on an out-of-order processor. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors – compiler, optimization, code generation, memory management. General Terms Measurement, Performance, Design, Experimentation, Algorithms. Keywords Long-range thread-based prefetching, pointer, slicing, slack, chaining speculative precomputation, speculation, prediction, scheduling, post-pass, dependence reduction, loop rotation, delay minimization, triggering. 1. INTRODUCTION Memory latency has become a critical bottleneck in achieving high performance on modern processors. Today, many large applications are memory intensive, as both their data working set and the complexity to predict their memory accesses increase. Despite continued advances in cache design and development of new prefetching techniques, the memory latency problem persists and escalates especially with pointer-intensive applications, which tend to defy conventional stride-based prefetching techniques. One solution is to overlap memory stalls in one program with the execution of useful instructions from another program, as done in emerging simultaneous multithreading (SMT) processor architectures [10][15][22][28]. In addition to improving multitasking throughput, SMT has also been used to improve the performance of single-threaded applications by leveraging speculative threads to perform cache prefetches on behalf of the main (or non-speculative) thread [25]. A speculative thread executes code to precompute memory addresses and issue prefetches. Instead of using a complex address pattern predictor, this pre-execution approach uses the program itself as a predictor to prefetch for a pointer-intensive program accurately and efficiently. Various forms of such thread-based prefetching have been proposed recently. Examples include Collins et al.’s speculative precomputation [7], Luk’s software controlled pre-execution [21], Roth and Sohi’s data driven multithreading [25], and Zilles and Sohi’s speculative slices [34]. These studies demonstrated the performance potential of thread-based prefetching by assuming the availability of hardware and/or compiler support. In this paper, we introduce an automated tool for transforming application code in order to attach prefetch threads in the binary. The aim of this paper is to demonstrate the feasibility of automatically generating binaries for thread-based prefetching and the effectiveness of the resulting binaries. To our knowledge, this work is the first to automate the entire process of extracting dependent instructions leading to target operations, identifying proper spawning points and managing inter-thread communication to ensure timely pre-execution. Our tool is post-pass because it does not modify the normal compilation steps, but rather is invoked after the compilation process. The tool is based on the speculative precomputation (SP) paradigm for future ItaniumTM processors [16]. SP utilizes hardware thread contexts to execute precomputation slices (p-slices), which consist of instructions that compute the memory addresses for prefetching [7]. Speculative threads can be spawned by one of two events: a basic trigger, which occurs when a designated trigger instruction in the non-speculative thread is retired, or a chaining trigger, by which one speculative thread explicitly spawns another. Collins et al. demonstrated that long-range prefetching using chaining triggers is the key to high performance via speculative precomputation [7]. As a proof of concept, they manually find the chaining triggers in the binary. Collins et al. later proposed dynamic speculative precomputation, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLDI’02, June 17-19, 2002, Berlin, Germany. Copyright 2002 ACM 1-58113-463-0/02/0006…$5.00. 117which shows the implementation of an all-hardware approach [6]. In contrast, our work uses the SMT model without expensive hardware support and relies on the post-pass compilation to generate p-slices and to place triggers judiciously. Instead of constructing p-slices dynamically, the post-pass tool examines code regions and extracts p-slices statically with profiling feedback. To maximize the concurrent usage of available memory bandwidth, the chaining triggers inside the p-slices are scheduled early across multiple threads. We also traverse the dependence graph to identify and embed basic triggers in the main thread’s code. We show that the tool is effective for a set of seven pointer-intensive benchmarks with exploitable parallelism among the prefetches in


View Full Document

CMU CS 15745 - Post-Pass Binary Adaptation for Software-Based Speculative Precomputation

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download Post-Pass Binary Adaptation for Software-Based Speculative Precomputation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Post-Pass Binary Adaptation for Software-Based Speculative Precomputation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Post-Pass Binary Adaptation for Software-Based Speculative Precomputation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?