CSUN COMP 546 - INSTRUCTION PIPELINING (II) - D1097229

Home> Schools> California State University, Northridge> Computer Science (COMP) > COMP 546> INSTRUCTION PIPELINING (II)

DOC PREVIEW

CSUN COMP 546 - INSTRUCTION PIPELINING (II)

School name California State University, Northridge

Course Comp 546-

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Datorarkitektur Fö 4 - 1Petru Eles, IDA, LiTHINSTRUCTION PIPELINING (II)1. Reducing Pipeline Branch Penalties2. Instruction Fetch Units and Instruction Queues3. Delayed Branching4. Branch Prediction Strategies5. Static Branch Prediction6. Dynamic Branch Prediction7. Branch History TableDatorarkitektur Fö 4 - 2Petru Eles, IDA, LiTHReducing Pipeline Branch Penalties• Branch instructions can dramatically affect pipelineperformance. Control operations (conditional andunconditional branch) are very frequent in currentprograms.•Some statistics:- 20% - 35% of the instructions executed arebranches (conditional and unconditional).- ~ 65% of the branches actually take the branch.- Conditional branches are much more frequentthan unconditional ones (more than two times).More than 50% of conditional branches aretaken.• It is very important to reduce the penaltiesproduced by branches.Datorarkitektur Fö 4 - 3Petru Eles, IDA, LiTHInstruction Fetch Units and Instruction Queues• Most processors employ sophisticated fetch unitsthat fetch instructions before they are needed andstore them in a queue.• The fetch unit also has the ability to recognizebranch instructions and to generate the targetaddress. Thus, penalty produced byunconditionalbranches can be drastically reduced: the fetch unitcomputes the target address and continues to fetchinstructions from that address, which are sent to thequeue. Thus, the rest of the pipeline gets acontinuous stream of instructions, without stalling.• The rate at which instructions can be read (from theinstruction cache) must be sufficiently high to avoidan empty queue.• Withconditional branches penalties can not beavoided. The branch condition, which usuallydepends on the result of the preceding instruction,has to be known in order to determine the followinginstruction.InstructionFetch UnitInstruction QueueRest of thepipelineInstructioncacheDatorarkitektur Fö 4 - 4Petru Eles, IDA, LiTHDelayed BranchingThese are the pipeline sequences for a conditionalbranch instruction (see slide 12, lecture 3)Branch is takenPenalty: 3 cyclesBranch is not takenPenalty: 2 cycles• The idea with delayed branching is to let the CPUdo some useful work during some of the cycleswhich are shown above to be stalled.• With delayed branching the CPU always executesthe instruction that immediately follows after thebranch and only then alters (if necessary) thesequence of execution. The instruction after thebranch is said to be in thebranch delay slot.FI DI12 834567Clock cycle →ADD R1,R2BEZ TARGETthe targetCOFO EI WOFI DI COFO EI WO9 101112FIstall stallFI DI COFO EI WOFI DI12 834567Clock cycle →ADD R1,R2BEZ TARGETinstr i+1COFO EI WOFI DI COFO EI WO9 101112FIstall stallDI COFO EI WODatorarkitektur Fö 4 - 5Petru Eles, IDA, LiTHDelayed Branching (cont’d)This is what the programmer has writtenMUL R3,R4 R3 ← R3*R4SUB #1,R2 R2 ← R2-1ADD R1,R2 R1 ← R1+R2BEZ TAR branch if zeroMOVE #10,R1 R1 ← 10- - - - - - - - - - - - -TAR - - - - - - - - - - - - -• The compiler (assembler) has to find an instructionwhich can be moved from its original place intothebranch delay slot after the branch and which will beexecuted regardless of the outcome of the branch.This is what the compiler (assembler) has produced andwhat actually will be executed:SUB #1,R2ADD R1,R2BEZ TARMUL R3,R4MOVE #10,R1- - - - - - - - - - - - -TAR - - - - - - - - - - - - -This instruction does notinfluence any of the instructionswhich follow until the branch; italso doesn’t influence theoutcome of the branch.This instruction shouldbe executed only if thebranch is not taken.This instruction will be execut-ed regardless of the condition.This will be executed only if thebranch has not been takenDatorarkitektur Fö 4 - 6Petru Eles, IDA, LiTHDelayed Branching (cont’d)This happens in the pipeline:Branch is takenPenalty: 2 cyclesBranch is not takePenalty: 1 cycleFI DI12 834567Clock cycle →ADD R1,R2BEZ TARMUL R3,R4COFO EI WOFI DI COFO EI WO9 101112FI DI COFO EI WOFI DI12 834567Clock cycle →ADD R1,R2BEZ TARMOVE #10,R1COFO EI WOFI DI COFO EI WO9 101112FIstallDI COFO EI WOthe targetFIstallFI DI COFO EI WOAt this moment, both thecondition (set by ADD) andthe target address are known.MUL R3,R4 FI DI COFO EI WOAt this moment the condition isknown and the MOVE can go on.Datorarkitektur Fö 4 - 7Petru Eles, IDA, LiTHDelayed Branching (cont’d)• What happens if the compiler is not able to find aninstruction to be moved after the branch, into thebranch delay slot?In this case a NOP instruction (an instruction thatdoes nothing) has to be placed after the branch. Inthis case the penalty will be the same as withoutdelayed branching.MUL R2,R4SUB #1,R2ADD R1,R2BEZ TARNOPMOVE #10,R1- - - - - - - - - - - - -TAR - - - - - - - - - - - - -• Some statistics show that for between 60% and85% of branches, sophisticated compilers are ableto find an instruction to be moved into the branchdelay slot.Now, with R2, this instruction in-fluences the following ones andcannot be moved from its place.Datorarkitektur Fö 4 - 8Petru Eles, IDA, LiTHBranch Prediction• In the last example we have considered that thebranch will not be taken and we fetched theinstruction following the branch; in the case thebranch was taken the fetched instruction wasdiscarded. As result, we hadbranch penalty of• Let us consider the opposite prediction:branch taken.For this solution it is needed that the target addressis computed in advance by an instruction fetch unit.Branch is takenPenalty: 1 cycle (prediction fulfilled)Branch is not takenPenalty: 2 cycles (prediction not fulfilled)1 if the branch is not taken(prediction fulfilled)2 if the branch is taken(prediction not fulfilled)FI DIADD R1,R2BEZ TARMUL R3,R4COFO EI WOFI DI COFO EI WOFI DI COFO EI WOMOVE #10,R1FIstallFI DI COFO EI WOFI DIADD R1,R2BEZ TARthe targetCOFO EI WOFI DI COFO EI WOstallDI COFO EI WOMUL R3,R4 FI DI COFO EI WOFI12 834567Clock cycle →9 10111212 834567Clock cycle →9 101112Datorarkitektur Fö 4 - 9Petru Eles, IDA, LiTHBranch Prediction (cont’d)• Correct branch prediction is very important and canproduce substantial performance improvements.• Based on the predicted outcome, the respective in-struction can be fetched, as well as the instructionsfollowing it, and they can be placed into the instruc-tion queue (see slide 3). If, after the branch condi-tion is computed, it turns out that the prediction wascorrect, execution continues. On the other hand, ifthe prediction is

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

CSUN COMP 546 - INSTRUCTION PIPELINING (II)

Sign up for free to view:

Please select your school