DOC PREVIEW
Berkeley COMPSCI 252 - ILP in loops

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1NOW Handout Page 1EECS 252 Graduate Computer ArchitectureLec 8 – ILP in loopsDavid CullerElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs2522/11/2005 CS252 Sp05 L8 loop-ilp2Review: Dynamic hardware techniques for out-of-order execution• HW exploitation of ILP– Works even when can’t know dependence at compile time.– Code for one machine runs well on another• Scoreboard (ala CDC 6600 in 1963)– Centralized control structure– No register renaming, no forwarding– Pipeline stalls for WAR and WAW hazards.– Are these fundamental limitations??? (No)• Reservation stations (ala IBM 360/91 in 1966)– Distributed control structures– Implicit renaming of registers (dispatched pointers)– WAR and WAW hazards eliminated by register renaming– Results broadcast to all reservation stations for RAW2/11/2005 CS252 Sp05 L8 loop-ilp3Review: Scoreboard Architecture(CDC 6600)Functional UnitsRegistersFP MultFP MultFP MultFP MultFP DivideFP DivideFP AddFP AddIntegerIntegerMemorySCOREBOARDSCOREBOARD2/11/2005 CS252 Sp05 L8 loop-ilp4Review: Four Stages of Scoreboard Control• Issue—decode instructions & check for structural hazards (ID1)– Instructions issued in program order (for hazard checking)– Don’t issue if structural hazard– Don’t issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards) • Read operands—wait until no data hazards, then read ops (ID2)– All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.– No forwarding of data in this model!• Execution—operate on operands (EX)– The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. • Write result—finish execution (WB)– Stall until no WAR hazards with previous instructions:Example: DIVD F0,F2,F4ADDD F10,F0,F8SUBD F8,F8,F14CDC 6600 scoreboard would stall SUBD until ADDD reads operands2/11/2005 CS252 Sp05 L8 loop-ilp5Review: Tomasulo OrganizationFP addersFP addersAdd1Add2Add3FP multipliersFP multipliersMult1Mult2From MemFP RegistersReservation StationsCommon Data Bus (CDB)To MemFP OpQueueLoad BuffersStore BuffersLoad1Load2Load3Load4Load5Load62/11/2005 CS252 Sp05 L8 loop-ilp6Review: Three Stages of TomasuloAlgorithm1.Issue—get instruction from FP Op QueueIf reservation station free (no structural hazard), control issues instr & sends operands (renames registers).2.Execution—operate on operands (EX)When both operands ready then execute;if not ready, watch Common Data Bus for result3.Write result—finish execution (WB)Write on Common Data Bus to all awaiting units; mark reservation station available• Normal data bus: data + destination (“go to” bus)• Common data bus: data + source (“come from” bus)– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)– Does the broadcast2NOW Handout Page 22/11/2005 CS252 Sp05 L8 loop-ilp7Review: Comparison Cycle 62Instruction status:Read Exec Write Exec WriteInstruction jkIssue Oper CompResultIssue ComplResultLD F634+R21234 134LD F2 45+ R3 5678 245MULTD F0 F2 F4 6 9 19 20 3 15 16SUBD F8 F6 F2 791112 478DIVD F10 F0 F6 8 216162 5 5657ADDD F6 F8 F2 13 14 16 22 6 10 11• Why take longer on scoreboard/6600?• Structural Hazards• Lack of forwarding• Deeper issue: WAW stalls 2/11/2005 CS252 Sp05 L8 loop-ilp8Outline• Tomasulo on loops• Register renaming• R1000 example• VLIW / EPIC• Case Study• Limits on Instruction Level Parallelism2/11/2005 CS252 Sp05 L8 loop-ilp9Tomasulo Loop ExampleLoop: LD F0 0 R1MULTD F4 F0 F2SD F4 0 R1SUBI R1 R1 #8BNEZ R1 Loop• Assume Multiply takes 4 clocks• Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit)• To be clear, will show clocks for SUBI, BNEZ• Reality: integer instructions ahead2/11/2005 CS252 Sp05 L8 loop-ilp10Loop ExampleInstruction status: Exec WriteITER Instruction j k Issue CompResultBusy AddrFu1LD F00R1 Load1 No1 MULTD F4 F0 F2 Load2 No1SD F40R1 Load3 No2LD F00R1 Store1 No2 MULTD F4 F0 F2 Store2 No2SD F40R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 LoopRegister result statusClockR1F0 F2 F4 F6 F8 F10 F12 ... F30080Fu2/11/2005 CS252 Sp05 L8 loop-ilp11Loop Example Cycle 1Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy AddrFu1LD F0 0 R1 1Load1Yes 801 MULTD F4 F0 F2 Load2 No1SD F4 0 R1 Load3 No2LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2SD F4 0 R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 LoopRegister result statusClockR1F0 F2 F4 F6 F8 F10 F12 ... F30180FuLoad12/11/2005 CS252 Sp05 L8 loop-ilp12Loop Example Cycle 2Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy AddrFu1LD F0 0 R1 1Load1Yes 801 MULTD F4 F0 F2 2Load2No1SD F4 0 R1 Load3 No2LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2SD F4 0 R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 LoopRegister result statusClockR1F0 F2 F4 F6 F8 F10 F12 ... F30280FuLoad1 Mult13NOW Handout Page 32/11/2005 CS252 Sp05 L8 loop-ilp13Loop Example Cycle 3Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy AddrFu1LD F0 0 R1 1Load1Yes 801 MULTD F4 F0 F2 2Load2No1SD F4 0 R1 3Load3No2LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2SD F4 0 R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 LoopRegister result statusClockR1F0 F2 F4 F6 F8 F10 F12 ... F30380FuLoad1 Mult1• Implicit renaming sets up “DataFlow” graph2/11/2005 CS252 Sp05 L8 loop-ilp14Loop Example Cycle 4Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy AddrFu1LD F0 0 R1 1Load1Yes 801 MULTD F4 F0 F2 2Load2No1SD F4 0 R1 3Load3No2LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2SD F4 0 R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk


View Full Document

Berkeley COMPSCI 252 - ILP in loops

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download ILP in loops
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ILP in loops and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ILP in loops 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?