EE392C Advanced Topics in Computer Architecture Polymorphic Processors Stanford University Lecture 16 Handout Date Applications of Machine Learning Techniques to Systems Lecture 11 Thursday 22 May 2003 Lecturer Chi Ho Yue Ilya Katsnelson Scribe Jean Suh Arjun Singh 1 Dynamic Branch Prediction with Perceptrons Introduction Modern computer architectures increasingly rely on speculation to boost ILP Accurate prediction increases the performance benefit of speculation Recent effort to improve branch prediction focuses on aliasing to eliminate destructive interference Perceptrons a new method for branch prediction target at improving the branch prediction accuracy itself Perceptrons is a simple neural network that works as an alternative to the commonly used two bit counters branch history table BHT branch predictor Perceptrons achieves increased accuracy than traditional BHT branch predictors because they can make use of longer branch histories given the same hardware budget Although having very similar organization to BHT branch predictors Perceptrons hardware requirement scales linearly with the branch history length used while BHT predictors increases exponentially In a system with branch prediction using Perceptrons A perceptron is assigned to every single static branch Each perceptron consists of one artificial neuron connecting several input units by weighted edges to one output unit as a function Y x0 xn of n inputs The xi represents the bits of the global branch history shift register The predictor uses the following weight function Y to make every branching decision Y w0 w1 x1 w2 x2 wn xn The w s are weight in signed integers assigned to each bits of history When a previous global branch was taken its corresponding xi is set to be 1 When that previous branch was not taken xi is set to be 1 The predicting branch is taken when Y is positive and vice versa When the actual branch output is known the perceptron system will train the predicting perceptron by updating the w s with its corresponding previous branch using correlation concept Since Perceptrons uses a dot production function to compute predictions there lays an issue with the function Correlations that make the dot product Y zero will not be noticed and make uses by Lecture 16 2 the perceptron predictor Such correlation as introduces by in the paper is said to be linearly inseparable correlation Traditional branch predictors using BHT such as gshare and bimode do not have this issue Therefore it would be beneficial to use a hybrid of perceptron and a BHT based predictor such as gshare Obvious advantages on using Perceptrons based branch predictors includes that Perceptrons are not difficult to both understand and implement on hardware Given the same hardware budget Perceptrons can make use of longer branch history for higher accuracy On the other hand Preceptrons would perform very well when branches are linearly inseparable Perceptrons also concentrate mostly on global branch history but less on local history which is they only use one weight for local branch history Lastly even with the circuit techniques mentioned in the paper it is not certain that if prediction can be computed in a single cycle when history length becomes longer because quite a series of computation hash function loading from SRAM and arithmetic operations must be executed before a prediction is obtained 2 Automated Predictors Synthesis As processor architectures have increased their reliance on speculative execution to improve performance the importance of accurate prediction of what to execute speculatively has increased In addition to that the types of values that systems attempt to speculate on have expanded from the ubiquitous branch and call return targets to the prediction of indirect jump targets cache ways and data values In general the prediction process identifies the current state of the system and makes a prediction for some yet un computed value based on that state Prediction accuracy is improved by learning what is a good prediction for that state using a feedback process at the time the predicted value is actually computed This paper makes an attempt to formally characterize the process of prediction Currently the majority of predictor designs are based on several well understood structures This presents some advantages in speed due to the fact that these structures are highly optimized However it also prevents some new some specialized designs from appearing In order to solve this problem the paper presents an algebraic style notation for describing wide variety of predictor structures in uniform way A simple primitive predictor is presented as a basic building block of any higher level predictor The notation for the primitive predictor is as following P w d I U Where w d static part of the predictor memory I input selection signal U feedback update signal Lecture 16 3 Special parser language BP language is also presented It allows automatic creation of simulators for predictors In this case the traces from real benchmark programs can be used to run on the simulators and equally compare the performance of different branch prediction techniques The uniformity of predictor nation also permits the use of different automated design search techniques The paper presents the genetic programming as one of these methods In order to apply genetic optimization algorithm to predictors each particular predictor is represented by a tree structure which directly corresponds to BP expression of the predictor The nodes in the tree represent different logical parts of the predictor Primitive predictors Functions Terminals and constants With genetic algorithms new predictors are constantly generated Their performance is evaluated using the BP parser specified above and the traces from SPEC benchmarks Replication Crossover Mutation Encapsulation and Expansion operations were used to generate new predictor structures Several restrictions to the growth of the predictors had to be put in place in order to ensure their implementation feasibility In particular the authors restricted the internal memory size of each predictor to 512Kb As a result of the application of genetic programming to the problem of finding the optimal branch or jump predictors the algorithms were able to invent several well known structures staring from just very simple ones This shows that it is be possible to create better predictor schemes in the future using the same approach
View Full Document
Unlocking...