Unformatted text preview:

Instruction Level Parallelism ILPOutlineWhat’s ILPExample: Sequential vs ILPILP vs Parallel ProcessingILP ChallengesDependences and HazardsTypes of DependenciesName dependencesData DependencesControl DependencesResource dependencesILP ArchitecturesILP Architectures ClassificationsSequential architecture and superscalar processorsSuperscalar ProcessorsDependence architecture and data flow processorsDependence architectures Dataflow processorsDataflow strengths and limitationsIndependence architecture and VLIW processorsVLIW processorsVLIW strengthsVLIW limitationsSummary: ILP ArchitecturesILP SchedulingILP Scheduling: Trace schedulingTrace SchedulingTrace Scheduling in HWTrace scheduling in SWILP open problemsReferencesInstruction Level Parallelism ILP Advanced Computer Architecture CSE 8383Spring 2004 2/19/2004Presented By: Sa’ad Al-HarbiSaeed Abu NimehOutlineWhat’s ILPILP vs Parallel ProcessingSequential execution vs ILP execution Limitations of ILPILP ArchitecturesSequential ArchitectureDependence ArchitectureIndependence ArchitectureILP SchedulingOpen ProblemsReferencesWhat’s ILPArchitectural technique that allows the overlap of individual machine operations ( add, mul, load, store …)Multiple operations will execute in parallel (simultaneously)Goal: Speed Up the executionExample:load R1  R2 add R3  R3, “1”add R3  R3, “1” add R4  R3, R2add R4  R4, R2 store [R4]  R0Example: Sequential vs ILPSequential execution (Without ILP)Add r1, r2  r8 4 cyclesAdd r3, r4  r7 4 cycles 8 cyclesILP execution (overlap execution)Add r1, r2  r8 Add r3, r4  r7Total of 5 cyclesILP vs Parallel ProcessingILPOverlap individual machine operations (add, mul, load…) so that they execute in parallelTransparent to the userGoal: speed up executionParallel ProcessingHaving separate processors getting separate chunks of the program ( processors programmed to do so)Nontransparent to the userGoal: speed up and quality upILP ChallengesIn order to achieve parallelism we should not have dependences among instructions which are executing in parallel:H/W terminology Data Hazards ( RAW, WAR, WAW)S/W terminology Data DependenciesDependences and HazardsDependences are a property of programsIf two instructions are data dependent they can not execute simultaneouslyA dependence results in a hazard and the hazard causes a stall Data dependences may occur through registers or memoryTypes of DependenciesName dependenciesOutput dependenceAnti-dependenceData True dependenceControl DependenceResource DependenceName dependencesOutput dependenceWhen instruction I and J write the same register or memory location. The ordering must be preserved to leave the correct value in the registeradd r7,r4,r3div r7,r2,r8Anti-dependenceWhen instruction j writes a register or memory location that instruction I readsi: add r6,r5,r4j: sub r5,r8,r11Data DependencesAn instruction j is data dependent on instruction i if either of the following hold:instruction i produces a result that may be used by instruction j , orinstruction j is data dependent on instruction k, and instruction k is data dependent on instruction iLOOP LD F0, 0(R1)ADD F4, F0, F2SD F4, 0(R1)SUB R1, R1, -8BNE R1, R2, LOOPControl DependencesA control dependence determines the ordering of an instruction i, with respect to a branch instruction so that the instruction i is executed in correct program order.Example:If p1 { S1;};If p2 { S2;};Two constraints imposed by control dependences:1. An instruction that is control dependent on a branch cannot be moved before the branch2. An instruction that is not control dependent on a branch cannot be moved after the branchResource dependencesAn instruction is resource-dependent on a previously issued instruction if it requires a hardware resource which is still being used by a previously issued instruction.e.g.div r1, r2, r3div r4, r2, r5ILP ArchitecturesComputer Architecture: is a contract (instruction format and the interpretation of the bits that constitute an instruction) between the class of programs that are written for the architecture and the set of processor implementations of that architecture.In ILP Architectures: + information embedded in the program pertaining to available parallelism between instructions and operations in the programILP Architectures ClassificationsSequential Architectures: the program is not expected to convey any explicit information regarding parallelism. (Superscalar processors)Dependence Architectures: the program explicitly indicates the dependences that exist between operations (Dataflow processors)Independence Architectures: the program provides information as to which operations are independent of one another. (VLIW processors)Sequential architecture and superscalar processorsProgram contains no explicit information regarding dependencies that exist between instructionsDependencies between instructions must be determined by the hardwareIt is only necessary to determine dependencies with sequentially preceding instructions that have been issued but not yet completedCompiler may re-order instructions to facilitate the hardware’s task of extracting parallelismSuperscalar ProcessorsSuperscalar processors attempt to issue multiple instructions per cycle However, essential dependencies are specified by sequential ordering so operations must be processed in sequential orderThis proves to be a performance bottleneck that is very expensive to overcomeDependence architecture and data flow processors The compiler (programmer) identifies the parallelism in the program and communicates it to the hardware (specify the dependences between operations)The hardware determines at run-time when each operation is independent from others and perform schedulingHere, no scanning of the sequential program to determine dependencesObjective: execute the instruction at the earliest possible time (available input operands and functional units).Dependence architectures Dataflow processorsDataflow processors are representative of Dependence architecturesExecute instruction at earliest possible time subject to availability of input operands and functional units Dependencies communicated by providing with each instruction a list of all successor instructions As soon as all


View Full Document

SMU CSE 8383 - Instruction Level Parallelism

Download Instruction Level Parallelism
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Instruction Level Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Level Parallelism 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?