Berkeley COMPSCI 252 - The Schemes and Performances of Dynamic Branch predictors - D407030

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> The Schemes and Performances of Dynamic Branch predictors

DOC PREVIEW

Berkeley COMPSCI 252 - The Schemes and Performances of Dynamic Branch predictors

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The Schemes and Performances of Dynamic Branch predictors Chih-Cheng Cheng Abstract The techniques of Instruction Level Parallelism (ILP) and pipeline have been used well to speed up the execution of instructions. The conditional branches are the critical factor to the effectiveness of a deep pipeline since the branch instructions can always break the flow of instructions through the pipeline and result in high execution cost. In order to achieve better CPU performance, many schemes of branch prediction have been utilized. These schemes sometimes can be categorized as program-based predictors vs. profile-based predictors, or static vs. dynamic schemes. This paper focuses on the study of the dynamic branch predictors since the dynamic approach of branch prediction has been developed much more than the static approach of branch prediction. However, their performances always have new and interesting discoveries based on different benchmarks and architectures. I studied as much documentation as I could within my very limited time, and basically perform comparisons between different techniques of dynamic branch prediction, and organize my findings in this paper. Introduction Deep pipelining has been one of the more effective ways of improving processor performance. However, when higher degrees of instruction level parallelism have been applied, poorer performance of CPU might occur that is a result from the unresolved branches stalls. For conditional branches, the target instruction cannot be fetched until the target address has been calculated and the conditional branch has been resolved, whereas the unconditional branch is not resolved until the target address is calculated. Whether with conditional or unconditional branches, pipeline stalls could occur once the number of cycles is taken to resolve the branch. The performance of CPU decreases when the pipeline bubbles increase. To solve these problems, predicting the branch direction and providing effective availability of target addresses for execution are two good methods. This paper will mainly focus on the schemes of predicting branch directions. By predicting, pre-fetching, and initiating execution of the branch target address before the branch is resolved, the execution penalty will be substantially reduced.The static scheme of branch prediction is just predicting whether all branches are taken or not taken. This measurement of performance was reported by Lee and Smith [LS84]. The static strategy can provide up to 68 percent accuracy. In addition, Su and Zhou [SZ95] had more measurements on this strategy and concluded the static predictor has the worst performance. Nevertheless, according to the static scheme, dynamic branch prediction strategies have been studied extensively and proved that it has better performance than static strategy. All of the dynamic predictors have much better performance than static predictors and can at least achieve 90 percent accuracy as reported by McFarling [M93]. Dynamic schemes are different from the static ones in the sense that they use the run-time behavior of branches to make predictions. Hennessy and Patterson [HP96] introduced the two-bit prediction scheme. Moreover, Lee and Smith [LS84] proposed a better structure for it. McFarling [M93] referred to it as the bimodal branch prediction. Yeh and Patt {YP91] discussed these variations. The basic idea is to use 2-bit saturating up-down counters to collect history information that is then used to make predictions. Based on this concept, Yeh and Patt [YP91] developed the two-level adaptive branch prediction scheme. McFarling [M93] even explored more approaches to achieve better accuracy of branch prediction. Due to the fast progress of computer technology, Su and Zhou [SZ95] showed different aspects of performance analysis. The following sections first introduce those well-known schemes of dynamic branch predictors. After knowing the schemes, each branch prediction performance is then explicitly presented through the comparison chart. Their performance analyses are made in the same section. Before concluding this paper, I present different aspects of performances running on updated benchmarks. The conclusion can then be based more accurately on these various experimental data. Schemes of Dynamic Branch prediction Ø Two-bit Counter Branch Prediction Scheme The two-bit counter scheme always assigns a two-bit counter to the prediction cache buffer for each entry when each conditional branch occurs. The counter value is between 0 and 3. The counter is incremented when branch is predicted as taken; otherwise the counter is decremented if the prediction is not taken. Therefore, a prediction must be wrong twice before it is changed. The most significant bit determines the taken decision of prediction. Hennessy and Patterson [HP96] noted three-bit or higher counters do not make much significant than two-bit counter does. Lee and Smith [LS84] also refer to it as two-bit saturating up-down counters. The buffer is responsible for collecting history information, and applies the information tomaking the prediction. The state of branch’s entry in the buffer is therefore changed dynamically when the branch instructions are executed. The states in a two-bit prediction scheme are illustrated in the figure 1. Taken Not taken Taken Taken Not taken Not taken Taken Not taken Figure 1. The states in a two-bit prediction scheme Ø Bimodal Branch Prediction Scheme Bimodal scheme takes advantage of two-bit counter branch behavior. The state of counters is stored in a counter table that records all the branches history. Each branch will then map to a unique counter. The branch history table is indexed by some bits of branch address. The mapping will not be any problem if the counter table is large enough for all branches, says more than 128K bytes; but for the smaller tables, the performance of accurate prediction can be impeded by sharing same counter with multiple branches. However, the smaller size of predictor still performs more effective than the two-bit scheme. The bimodal predictor structure is shown in figure 2. Predict taken Predict taken Predict not taken Predict not takenCounts Taken Prediction taken Taken n-bit (for indexing) Figure 2. Bimodal Predictor Structure Ø Correlated

View Full Document