Understanding the Differences Between Value Prediction and Instruction Reuse Avinash Sodani and Gurindar S Sohi Computer Sciences Department University of Wisconsin Madison 1210 West Dayton Street Madison WI 53706 USA sodani sohi cs wisc edu Abstract Recently two hardware techniques Value Prediction VP and Instruction Reuse IR have been proposed for exploiting the redundancy in programs to collapse data dependences In this paper we attempt to understand the different ways in which VP and IR interact with other microarchitectural features and the impact of such interactions on net performance More specifically we perform the following tasks i we identify the various differences between the two techniques and qualitatively discuss their microarchitectural interactions ii we evaluate the impact on performance of these interactions and iii since IR is more restrictive of the two techniques we also estimate the amount of total redundancy present in programs that can be captured by IR Our results show that the performance obtained by VP is sensitive to the way branches with value speculative operands are handled We also see that although IR captures less amount of redundancy it may perform equally well because it validates results early it is non speculative and it reduces branch misprediction penalty Finally we show that 84 97 of redundancy in programs can be reused implying that the approach of detecting redundant instructions non speculatively based on their operands does not significantly restrict IR s ability to capture redundancy present in programs 1 Introduction Several recent studies 2 5 8 10 have shown that there is significant result redundancy in programs i e many instructions perform the same computation and hence produce the same result over and over again These studies have found that for several benchmarks more than 75 of the dynamic instructions produce the same result as before Also recently two hardware techniques have been proposed to exploit this redundancy i Value Prediction VP 3 4 5 and ii Instruction Reuse IR 9 Both techniques attempt to reduce the execution time of programs by alleviating the dataflow constraint They use the redundancy in programs to determine speculatively Value Prediction or non speculatively Instruction Reuse the results of instructions without actually executing them The advantage of doing so is that instructions do not have to wait for their source instructions to execute first they can execute sooner using the results obtained by the above two techniques thus relaxing the dataflow constraint Although both VP and IR attempt to shorten the critical path through a computation they follow very different approaches VP predicts the results of instructions or alternatively the inputs of other instructions based on the previously seen results performs computation using the predicted values and confirms the speculation at a later point The critical path is shortened since the instructions that would normally be executed sequentially could be executed speculatively in parallel On the other hand IR recognizes that a certain computation chain has been performed before and therefore need not be performed again i e it splices out a chain of computation from the critical path The effectiveness of any microarchitectural technique in improving the net performance of a processor not only depends on how well it performs by itself but also on how it interacts with other microarchitectural features e g branch prediction availability of resources when it is integrated in a pipeline Since VP and IR are different techniques they not only perform differently by themselves i e capture different amounts of the redundancy present in programs but also interact with other microarchitectural features in different ways thereby impacting the net performance differently The purpose of this work is to identify and evaluate the different microarchitectural interactions of these techniques The intent is not to argue which technique is better but is to gain a better understanding of the working of each technique We feel that will help in designing other techniques possibly hybrid of VP and IR that exploit the redundancy in programs more profitably More specifically in this paper we achieve the following three tasks i We identify the various differences between the two techniques and qualitatively discuss their microarchitectural interactions ii We evaluate the impact on performance of these interactions And finally iii since IR is more restrictive of the two techniques we discuss this later we also estimate how much of the total redundancy present in programs can be captured by IR The layout for the rest of the paper is as follows In Section 2 we describe VP and IR in more detail In Section 3 we identify the various differences between them and qualitatively discuss various interactions and their the impacts on performance In Section 4 we evaluate these interactions quantitatively Finally in Section 5 we summarize and provide conclusions Fetch Decode Rename VPT PC Access prediction PC Issue RB Access Execute Issue When an instruction is first executed its results are stored in a hardware structure called a Reuse Buffer RB indexed by its PC When the instruction is encountered again its previous results are read from the RB in parallel with fetching the instruction and their validity established by a reuse test in parallel with decoding the instruction The reuse test validates results by establishing that the current operands values are the same as those used to calculate the results There are different ways of doing so one of which is described later in Section 4 1 2 of this paper Since the correct results are known a reused instruction is not executed and instead it is queued for retirement IR collapses true dependences by reusing in the same cycle a dependent chain of instructions that would normally execute sequentially In Figure 2 we illustrate how VP and IR improve performance by collapsing data dependences In the figure we show a flow of a dependent chain of instructions I J and K through three different pipelines i a base pipeline without VP or IR ii a pipeline with VP and iii a pipeline with IR In all three cases we assume the instructions I J and K are fetched decoded and renamed together In the base pipeline the instructions execute sequentially since they are data dependent requiring three cycles to execute them the chain is committed by cycle 6 In the pipeline
View Full Document
Unlocking...