DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 15 - Out-of-Order Memory, Complex Superscalars Review

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 15 Out of Order Memory Complex Superscalars Review Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http www eecs berkeley edu krste http inst eecs berkeley edu cs152 Review of Last 3 Lectures 3 31 2009 CS152 Spring 09 2 Phases of Instruction Execution PC I cache Fetch Buffer Issue Buffer Func Units Result Buffer Arch State 3 31 2009 Fetch Instruction bits retrieved from cache Decode Instructions decoded registers renamed placed in appropriate issue buffer Execute Instructions and operands sent to execution units When execution completes all results and exception flags are available Commit Instruction irrevocably updates architectural state CS152 Spring 09 3 In Order Pipeline ALU IF ID Issue Mem WB Fadd Fmul Instructions pass through issue stage and enter execution in order May complete out oforder but must commit in order 3 31 2009 Fdiv CS152 Spring 09 4 Exception Handling Commit Point In Order Five Stage Pipeline Inst Mem PC Select Handler PC PC Address Exceptions Kill F Stage D Decode E Illegal Opcode M Overflow Data Mem Data Addr Except W Kill Writeback Exc D Exc E Exc M Cause PC D PC E PC M EPC Kill D Stage Kill E Stage Asynchronous Interrupts Hold exception flags in pipeline until commit point M stage Exceptions in earlier pipe stages override later exceptions Inject external interrupts at commit point override others If exception at commit update Cause and EPC registers kill all stages inject handler PC into fetch stage 3 31 2009 CS152 Spring 09 5 Commit Point In Order Superscalar Pipeline PC Inst 2 D Mem Dual Decode GPRs Fetch two instructions per cycle issue both simultaneously if one is integer memory and other is floating point Inexpensive way of increasing throughput examples include Alpha 21064 1992 MIPS R5000 series 1996 Same idea can be extended to wider issue by duplicating functional units e g 4 issue UltraSPARC but regfile ports and bypassing costs grow quickly 3 31 2009 FPRs X1 X1 CS152 Spring 09 X2 Data Mem X3 W X2 FAdd X3 W X2 FMul X3 Unpipelined FDiv X2 divider X3 6 Out of Order Issue ALU IF ID Issue Mem WB Fadd Fmul Issue stage buffer holds multiple instructions waiting to issue Decode adds next instruction to buffer if there is space and the instruction does not cause a WAR or WAW hazard Note WAR possible again because issue is out of order WAR not possible with in order issue and latching of input operands at functional unit Any instruction in buffer whose RAW hazards are satisfied can be issued for now at most one dispatch per cycle On a write back WB new instructions may get enabled 3 31 2009 CS152 Spring 09 7 Register Renaming ALU IF ID Mem Issue WB Fadd Fmul Decode does register renaming and adds instructions to the issue stage reorder buffer ROB renaming makes WAR or WAW hazards impossible Any instruction in ROB whose RAW hazards have been satisfied can be dispatched Out of order or dataflow execution 3 31 2009 CS152 Spring 09 8 Out of Order Execution Pipeline In order Fetch Out of order Reorder Buffer Decode Commit Kill Kill Kill In order Execute Inject handler PC Exception Instructions fetched and decoded into instruction reorder buffer in order Execution is out of order out of order completion Commit write back to architectural state i e regfile memory is in order Temporary storage needed in ROB to hold results before commit 3 31 2009 CS152 Spring 09 9 Data in ROB Design HP PA8000 Pentium Pro Core2Duo Register File holds only committed state Ins use exec op p1 src1 p2 src2 pd dest data t1 t2 tn Reorder buffer Load Unit FU FU FU Store Unit Commit t result On dispatch into ROB ready sources can be in regfile or in ROB dest copied into src1 src2 if ready before dispatch On completion write to dest field and broadcast to src fields On issue read from ROB src fields 3 31 2009 CS152 Spring 09 10 Unified Physical Register File MIPS R10K Alpha 21264 Pentium 4 r1 r2 ti tj Rename Table t1 t2 tn Snapshots for mispredict recovery Load Unit FU FU FU ROB not shown Reg File FU Store Unit t result One regfile for both committed and speculative values no data in ROB During decode instruction result allocated new physical register source regs translated to physical regs through rename table Instruction reads data from regfile at start of execute not in decode Write back updates reg busy bits on instructions in ROB assoc search Snapshots of rename table taken at every branch to recover mispredicts On exception renaming undone in reverse order of issue MIPS R10000 3 31 2009 CS152 Spring 09 11 Pipeline Design with Physical Regfile Branch Resolution kill Branch Prediction PC Fetch kill kill Decode Rename Update predictors kill Out of Order Reorder Buffer In Order Commit In Order Physical Reg File Branch ALU MEM Unit Store Buffer D Execute 3 31 2009 CS152 Spring 09 12 CS152 Administrivia Quiz 4 Tuesday April 7 Complex Pipelining Quiz 5 and 6 moved back one class Quiz 5 Thursday April 23 Quiz 6 Thursday May 7 Also PS Lab 5 6 moved back one class 3 31 2009 CS152 Spring 09 13 Memory Dependencies st r1 r2 ld r3 r4 When can we execute the load 3 31 2009 CS152 Spring 09 14 In Order Memory Queue Execute all loads and stores in program order Load and store cannot leave ROB for execution until all previous loads and stores have completed execution Can still execute loads and stores speculatively and out of order with respect to other instructions Need a structure to handle memory ordering 3 31 2009 CS152 Spring 09 15 Conservative O o O Load Execution st r1 r2 ld r3 r4 Split execution of store instruction into two phases address calculation and data write Can execute load before store if addresses known and r4 r2 Each load address compared with addresses of all previous uncommitted stores can use partial conservative check i e bottom 12 bits of address Don t execute load if any previous store address not known MIPS R10K 16 entry address queue 3 31 2009 CS152 Spring 09 16 Address Speculation st r1 r2 ld r3 r4 Guess that r4 r2 Execute load before store address known Need to hold all completed but uncommitted load store addresses in program order If subsequently find r4 r2 squash load and all following instructions Large penalty for inaccurate address speculation 3 31 2009 CS152 Spring 09 17 Memory Dependence Prediction Alpha 21264 st r1 r2 ld r3 r4 Guess that r4 r2 and execute load before store If later find r4 r2 squash load and all following instructions but mark load


View Full Document

Berkeley COMPSCI 152 - Lecture 15 - Out-of-Order Memory, Complex Superscalars Review

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 15 - Out-of-Order Memory, Complex Superscalars Review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 15 - Out-of-Order Memory, Complex Superscalars Review and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?