DOC PREVIEW
UCLA COMSCI M151B - lec8-c4

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

4_154_164_17PipeliningChapter 4 (continued)Chapter 4 — The Processor — 2Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage  shorter clock cycle Multiple issue Replicate pipeline stages  multiple pipelines Start multiple instructions per clock cycle CPI < 1, so use Instructions Per Cycle (IPC) E.g., 4GHz 4-way multiple-issue 16 BIPS, peak CPI = 0.25, peak IPC = 4 But dependencies reduce this in practice§4.10 Parallelism via InstructionsChapter 4 — The Processor — 3Multiple Issue Static multiple issue Compiler groups instructions to be issued together Packages them into “issue slots” Compiler detects and avoids hazards Dynamic multiple issue CPU examines instruction stream and chooses instructions to issue each cycle Compiler can help by reordering instructions CPU resolves hazards using advanced techniques at runtimeChapter 4 — The Processor — 4Speculation “Guess” what to do with an instruction Start operation as soon as possible Check whether guess was right If so, complete the operation If not, roll-back and do the right thing Common to static and dynamic multiple issue Examples Speculate on branch outcome Roll back if path taken is different Speculate on load Roll back if location is updatedChapter 4 — The Processor — 5Compiler/Hardware Speculation Compiler can reorder instructions e.g., move load before branch Can include “fix-up” instructions to recover from incorrect guess Hardware can look ahead for instructions to execute Buffer results until it determines they are actually needed Flush buffers on incorrect speculationChapter 4 — The Processor — 6Speculation and Exceptions What if exception occurs on a speculatively executed instruction? e.g., speculative load before null-pointer check Static speculation Can add ISA support for deferring exceptions Dynamic speculation Can buffer exceptions until instruction completion (which may not occur)Chapter 4 — The Processor — 7Static Multiple Issue Compiler groups instructions into “issue packets” Group of instructions that can be issued on a single cycle Determined by pipeline resources required Think of an issue packet as a very long instruction Specifies multiple concurrent operations  Very Long Instruction Word (VLIW)Chapter 4 — The Processor — 8Scheduling Static Multiple Issue Compiler must remove some/all hazards Reorder instructions into issue packets No dependencies with a packet Possibly some dependencies between packets Varies between ISAs; compiler must know! Pad with nop if necessaryChapter 4 — The Processor — 9MIPS with Static Dual Issue Two-issue packets One ALU/branch instruction One load/store instruction 64-bit aligned ALU/branch, then load/store Pad an unused instruction with nopAddress Instruction type Pipeline Stagesn ALU/branch IF ID EX MEM WBn + 4 Load/store IF ID EX MEM WBn + 8 ALU/branch IF ID EX MEM WBn + 12 Load/store IF ID EX MEM WBn + 16 ALU/branch IF ID EX MEM WBn + 20 Load/store IF ID EX MEM WBChapter 4 — The Processor — 10MIPS with Static Dual IssueChapter 4 — The Processor — 11Hazards in the Dual-Issue MIPS More instructions executing in parallel EX data hazard Forwarding avoided stalls with single-issue Now can’t use ALU result in load/store in same packet add $t0, $s0, $s1load $s2, 0($t0) Split into two packets, effectively a stall Load-use hazard Still one cycle use latency, but now two instructions More aggressive scheduling requiredPipeliningChapter 4 (continued)Chapter 4 — The Processor — 2Scheduling Example Schedule this for dual-issue MIPSLoop: lw $t0, 0($s1) # $t0=array elementaddu $t0, $t0, $s2 # add scalar in $s2sw $t0, 0($s1) # store resultaddi $s1, $s1,–4 # decrement pointerbne $s1, $zero, Loop # branch $s1!=0ALU/branch Load/store cycleLoop: nop lw $t0, 0($s1) 1addi $s1, $s1,–4 nop 2addu $t0, $t0, $s2 nop 3bne $s1, $zero, Loop sw $t0, 4($s1) 4 IPC = 5/4 = 1.25 (c.f. peak IPC = 2)Chapter 4 — The Processor — 3Loop Unrolling Replicate loop body to expose more parallelism Reduces loop-control overhead Use different registers per replication Called “register renaming” Avoid loop-carried “anti-dependencies” Store followed by a load of the same register Aka “name dependence” Reuse of a register nameLoop UnrollingLoop: lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4bne $s1, $zero, LoopLoop: lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4bne $s1, $zero, LoopALU or branch Data transfer Clock cycleLoop: nop lw $t0, 0($s1) 1addi $s1, $s1, -4 nop 2addu $t0, $t0, $s2 nop 3bne $s1, $zero, Loop sw $t0, 4($s1) 4Loop Unrolling (2)Loop: lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)lw $t0, -4($s1)addu $t0, $t0, $s2sw $t0, -4($s1)lw $t0, -8($s1)addu $t0, $t0, $s2sw $t0, -8($s1)lw $t0, -12($s1)addu $t0, $t0, $s2sw $t0, -12($s1)addi $s1, $s1, -16bne $s1, $zero, LoopLoop: lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)addi $s1, $s1, -4bne $s1, $zero, LoopLoop Unrolling (3)Loop: lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)lw $t1, -4($s1)addu $t1, $t1, $s2sw $t1, -4($s1)lw $t2, -8($s1)addu $t2, $t2, $s2sw $t2, -8($s1)lw $t3, -12($s1)addu $t3, $t3, $s2sw $t3, -12($s1)addi $s1, $s1, -16bne $s1, $zero, LoopLoop: lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)lw $t0, -4($s1)addu $t0, $t0, $s2sw $t0, -4($s1)lw $t0, -8($s1)addu $t0, $t0, $s2sw $t0, -8($s1)lw $t0, -12($s1)addu $t0, $t0, $s2sw $t0, -12($s1)addi $s1, $s1, -16bne $s1, $zero, LoopLoop Unrolling (4)Loop: lw $t0, 0($s1)addu $t0, $t0, $s2sw $t0, 0($s1)lw $t1, -4($s1)addu $t1, $t1, $s2sw $t1, -4($s1)lw $t2, -8($s1)addu $t2, $t2, $s2sw $t2, -8($s1)lw $t3, -12($s1)addu $t3, $t3, $s2sw $t3, -12($s1)addi $s1, $s1, -16bne $s1, $zero, LoopALU or branch Data transfer Clock cycleLoop:addi $s1, $s1,


View Full Document

UCLA COMSCI M151B - lec8-c4

Documents in this Course
lec10-c7

lec10-c7

32 pages

lec9-c5

lec9-c5

22 pages

lec8-c5

lec8-c5

47 pages

lec7-c4

lec7-c4

33 pages

lec6-c4

lec6-c4

38 pages

lec5-c4

lec5-c4

33 pages

lec4-c4

lec4-c4

33 pages

Load more
Download lec8-c4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lec8-c4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lec8-c4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?