DOC PREVIEW
UH COSC 6385 - COSC 6385- Instruction Level Parallelism with Software Approaches

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Edgar GabrielCOSC 6385 Computer Architecture - Instruction Level Parallelism with Software Approaches Edgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielMultiple Issue• Take advantage of the fact that we have multiple functional units– further decrease ideal CPI (<1)• Two flavors of multiple-issue processors:– Superscalar– VLIW (Very long instruction Word)Issue structureHazard detectionScheduling Dist. characteristicExamplesSuperscalar (static)Dynamic Hardware Static In-order executionSun UltraSPARC II/IIISuperscalar(dynamic)Dynamic Hardware Dynamic Limited out-of-order exec.IBM Power2Superscalar(speculative)Dynamic HardwareDynamic with speculationOut-of-order exec. with spec.Pentium III/4IBM RS64III, 0VLIW Static Software StaticNo hazards between issue packetsi860, TrimediaEPIC Mostly static Mostly softwareMostly static Dependencies marked by compilerItanium2COSC 6385 – Computer ArchitectureEdgar GabrielSuperscalar architectures• Issue a varying number of instructions per cycle– Statically scheduled using compiler techniques– Dynamically scheduled (e.g. using Tomasulo’s algorithm)• Why a varying number of instructions?– Statically scheduled -> no out-of-order execution– Can check for hazards at issue time– Issue logic will issue instructions which cause a hazardCOSC 6385 – Computer ArchitectureEdgar GabrielSome details• Issue unit receives between one and n instructions from the instruction fetch unit (n being typically 4 or 8)→ issue packet• Instruction fetch unit examines each instruction in the issue packet in order• If an instruction causes a structural hazard or a data hazard, it will not be issued• Since the checking for structural and data hazards are complex operations, the instruction fetch unit is implemented as a pipeline, e.g– First stage checks for hazards within the issue packet – Second stage checks for hazards with already issued instructions3COSC 6385 – Computer ArchitectureEdgar GabrielDynamically scheduled superscalar MIPS• Extend Tomasulo’s algorithm to handle multiple issues per cycle– Must issue instructions to reservation stations in order to maintain program semantics• Note: we do not handle the details on how multiple issue works for Tomasulo’s algorithmsCOSC 6385 – Computer ArchitectureEdgar GabrielLimitations of ILP in hardware• Need to be able to look arbitrarily far ahead to find instructions to issue• Predict all branches perfectly• Rename all registers to avoid WAR and WAW hazards• Avoid data dependencies amongst instructions in an issue packet• Handling memory dependencies among issuing instructions• Provide enough replicated functional units to issue all instruction which are ready4COSC 6385 – Computer ArchitectureEdgar GabrielLimitations of ILP in hardware (II)• Finite number of Registers– For a single instruction stream– When considering thread-level parallelism• Imperfect data dependence analysis depending on whether the data is located on the stack, the heap or is statically declared.COSC 6385 – Computer ArchitectureEdgar GabrielVLIW processors• Issue a fixed number of instructions – As one large instruction or– As a fixed instruction packet• Parallelism among instructions has to be explicitly indicated by the instructions– Statically scheduled by compiler5COSC 6385 – Computer ArchitectureEdgar GabrielAn examplefor ( i=1000; i > 0 ; i-- ) {x[i] = x[i] + s;}Loop: L.D F0, 0(R1)ADD.D F4, F0, F2S.D F4, 0(R1)DADDUI R1, R1, #-8BNE R1, R2, LoopCOSC 6385 – Computer ArchitectureEdgar GabrielAssumptions• 1 cycle branch delayInstruction producing resultsInstruction using resultsLatency in clock cyclesFP ALU FP ALU 3FP ALU Store 2Load FP FP ALU 1Load FP Store 06COSC 6385 – Computer ArchitectureEdgar GabrielAn example (III)Loop: L.D F0, 0(R1) 1s 2ADD.D F4, F0, F2 3s 4s 5S.D F4, 0(R1) 6DADDUI R1, R1, #-8 7s 8BNE R1, R2, Loop 9s 10COSC 6385 – Computer ArchitectureEdgar GabrielRescheduling the codeLoop: L.D F0, 0(R1) 1DADDUI R1, R1, #-8 2ADD.D F4, F0, F2 3s 4BNE R1, R2, Loop 5S.D F4, 8(R1) 6• Each loop iteration consists of 3 instructions of actual work (load, add, store) and 3 cycles loop overheadDelayed branch slot7COSC 6385 – Computer ArchitectureEdgar GabrielLoop unrollingLoop: L.D F0, 0(R1)ADD.D F4, F0, F2S.D F4, 0(R1) /* Drop DADDUI & BNE */L.D F6, -8(R1)ADD.D F8, F6, F2S.D F8, -8(R1) /* Drop DADDUI & BNE */L.D F10, -16(R1)ADD.D F12, F10, F2S.D F12, -16(R1) /* Drop DADDUI & BNE */L.D F14, -24(R1)ADD.D F16, F14, F2S.D F16, -24(R1)DADDUI R1, R1, #-32BNE R1, R2, LoopCOSC 6385 – Computer ArchitectureEdgar GabrielLoop unrolling (II)• Eliminates three branches and three decrements– Reduces loop overhead• The previous code sequence still contains many stalls, since many operations are still dependent on each other• Requires a large number of registers• Increase of the code size• For general application of loop unrolling:– Two consecutive loops• First executes (n mod k ) times, not unrolled– n: number of loop iterations– k: unroll depth of the loop• Second executes (n/k) times, unrolled8COSC 6385 – Computer ArchitectureEdgar GabrielScheduled version of the unrolled loopLoop: L.D F0, 0(R1)L.D F6, -8(R1)L.D F10, -16(R1)L.D F14, -24(R1)ADD.D F4, F0, F2ADD.D F8, F6, F2ADD.D F12, F10, F2ADD.D F16, F14, F2S.D F4, 0(R1)S.D F8, -8(R1)DADDUI R1, R1, #-32S.D F12, 16(R1)BNE R1, R2, LoopS.D F16, 8(R1)COSC 6385 – Computer ArchitectureEdgar GabrielScheduled version of the unrolled loop (II)• Non trivial transformations in the previous code– Adjust the offsets of S.D instructions– Determine that it is legal to move the S.D after DADDUIand BNE– Determine that loads and stores of difference iterations can be interchanged9COSC 6385 – Computer ArchitectureEdgar GabrielEffects of register renamingLoop: L.D F0, 0(R1)ADD.D F4, F0, F2S.D F4, 0(R1)L.D F0, -8(R1)ADD.D F4, F0, F2S.D F4, -8(R1)L.D F0, -16(R1)ADD.D F4, F0, F2S.D F4, -16(R1)L.D F0, -24(R1)ADD.D F4, F0, F2S.D F4, -24(R1)DADDUI R1, R1, #-32BNE R1, R2, LoopLoop: L.D F0, 0(R1)ADD.D F4, F0, F2S.D F4, 0(R1)L.D F6, -8(R1)ADD.D F8, F6, F2S.D F8, -8(R1)L.D F10, -16(R1)ADD.D F12, F10, F2S.D F12, -16(R1)L.D F14, -24(R1)ADD.D F16, F14, F2S.D F16, -24(R1)DADDUI R1, R1, #-32BNE R1, R2, LoopNo dependencies between the loopbodies of two iterations!COSC 6385 – Computer ArchitectureEdgar GabrielLimitations of loop unrolling• Code size limitations• Lower limit on the


View Full Document

UH COSC 6385 - COSC 6385- Instruction Level Parallelism with Software Approaches

Download COSC 6385- Instruction Level Parallelism with Software Approaches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view COSC 6385- Instruction Level Parallelism with Software Approaches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view COSC 6385- Instruction Level Parallelism with Software Approaches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?