DOC PREVIEW
UH COSC 6385 - Pipelining (II)

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Edgar GabrielCOSC 6385 Computer Architecture - Pipelining (II)Edgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielPerformance evaluation of pipelines (I)enhorgTimeTimeSpeedup =enhenhenhorgorgorgCPIeClockClyclICCPIClockCycleIC××××=For a fixed application lets assume that ICorg= ICenhenhenhorgorgCPIeClockClyclCPIClockCycleSpeedup××=If we assume additionally that the CPU has the same frequency, i.e. ClockCycleorg= ClockCycleenhenhorgCPICPISpeedup =General Speedup Formula:2COSC 6385 – Computer ArchitectureEdgar GabrielPerformance evaluation of pipelines (II)enhorgoverallTimeTimeSpeedup =∑∑==××××=nienhiienhniorgiiorgCPIICeClockClyclCPIICeClockClycl11withIf looking at individual classes of instructionstotaliiICICf =If ICtotaldoes not change, you can also use the average Instruction execution time (AvIETime)enhorgoverallTimeTimeSpeedup =∑∑==××××=nienhiienhniorgiiorgCPIfeClockClyclCPIfeClockClycl11COSC 6385 – Computer ArchitectureEdgar GabrielComparing pipelined and non-pipelined execution• An ideal pipeline produces one result per clock cycle→Ideal CPIpipelined= 1• using the average instruction execution time (AvIETime)stagespipelinepipelinednonpipelinednoTimeTime__=stagespipelinepipelinedpipelinednonnoTimeTimeSpeedup__==pipelinedpipelinednonAvIETimeAvIETimeSpeedup_=pipelinedpipelinednonpipelinedpipelinednonClockCycleClockCycleCPICPI__×=3COSC 6385 – Computer ArchitectureEdgar GabrielComparing pipelined and non-pipelined execution (II)pipelinedpipelinednonAvIETimeAvIETimeSpeedup_=Thus:If ClockCycle is constant:erInstrallCyclesPPipelineStCPISpeeduppipelinednon+=1_pipelinedpipelinednonpipelinednonClockCycleClockCycleerInstrallCyclesPPipelineStCPI__1×+=Realistic CPIpipelined= Ideal CPIpipelined+ Pipeline stall cycles per instructionCOSC 6385 – Computer ArchitectureEdgar GabrielExample I• (A) Given an non-pipelined processor:– 1 ns clock cycle time– 4 Cycles for ALU operations– 4 cycles for branches– 5 cycles for memory operations• (B) Given also a pipelined processor– 1.2 ns clock cycle time• Both (A) and (B) have– 40% ALU operations– 40% branches– 20% memory operations• What is the speedup of (B) over (A) due to pipelining?4COSC 6385 – Computer ArchitectureEdgar GabrielExample IFor machine (A):∑=××=niiiAACPIfClockCycleAvIETime1)(nsns 4.4)52.044.044.0(1=×+×+××=For machine (B): assuming ideal CPI (= 1)∑=××=niiiBBCPIfClockCycleAvIETime1)(nsns2.1)14.012.014.0(2.1=×+×+××=7.32.14.4)()(===nsnsAvIETimeAvIETimeSpeedupBAThusCOSC 6385 – Computer ArchitectureEdgar GabrielExceptions• Instruction execution order is interrupted • E.g.– I/O device request– Invoking an OS service from an application– Tracing execution– Breakpoint– Integer or FP arithmetic anomaly (e.g. overflow)– Page fault– Misaligned memory access– Memory protection violation– Hardware malfunction5COSC 6385 – Computer ArchitectureEdgar GabrielClassification of Exceptions• Problems with pipelining:– Different stages of the pipeline can raise exceptions leading to a different order of exceptions compared to the unpipelined case• Classes of exceptions1. Synchronous vs. Asynchronous: 2. User requested vs. Coerced3. User maskable vs. user non-maskable4. Within vs. between instructions5. Resume vs. terminateCOSC 6385 – Computer ArchitectureEdgar GabrielExceptions• Most problematic: exceptions raised within instructions, where the instruction must be resumed– Another program must be invoked to save the state of the program• Pipelines capable of handling exceptions are called restartablePipeline stage Possible exceptionsIF Page fault on Instruction fetch; misaligned memory access; memory protection violationID Undefined or illegal opcodeEX Arithmetic exceptionMEM Page fault on data fetch; misaligned memory access; memory protection violationWB Non6COSC 6385 – Computer ArchitectureEdgar GabrielExceptions• Since an exception can not be raised when it occurs– Status vector associated with instruction shows exception– Status vector carried along with instruction– Writing of data values disabled if status vector is set– In WB status vector checked and exception handled=> Exception of instruction i handled before exception of instruction i+1=> Since no data values are written back, register file not changed -> instruction can be repeatedCOSC 6385 – Computer ArchitectureEdgar GabrielMulti-cycle instructions• Floating point instructions can take many cycles to complete• Often implemented by multiple executions of the EX stage– Not all instructions will take the same amount of cycles to finish!• Latency: – number of intervening cycles between an instruction that produces a result and instruction that uses the result– Usually: depth of the EX stage -1• Initiation interval: – Number of cycles that must elapse between issuing two operations of a given type• Multi-cycle instructions/pipelines increase the probability for occurring WAW and RAW hazards7COSC 6385 – Computer ArchitectureEdgar GabrielExample for a multi-cycle pipelineIF IDEXInteger unitM1M2 M3 M4 M5 M6 M7FP/Integer multiply unitA1A2 A3 A4FP/Integer add unitDIVFP/Integer division (non pipelined)MEM WBFunctional unit Latency Initiation intervalInteger ALU 0 1Data memory 1 1FP add 3 1FP multiply 6 1FP divide 24 25COSC 6385 – Computer ArchitectureEdgar GabrielInstruction level parallelism• Exploit parallelism between independent instructions– Limited by data dependencies– Limited by branches• Example: – Each iteration of the loop is independent– Exploitation of that fact is not trivial because of register reuse!for (i=0; i<n; i++ ) {c[i] = a[i] + b[i];}8COSC 6385 – Computer ArchitectureEdgar GabrielInstruction level parallelism• Data dependencies:– True dependencies: instruction i produces a result required by instruction i+k, k>0 (RAW)• sharing a register or a memory location– Name dependencies: usage of the same register or memory location without data flow• Antidependence: instruction i+k writes a register/memory location read by instruction i (WAR)– No problem if not reordering instructions• Output dependence: instruction i and instruction i+k write the same register/memory location (WAW) – No problem if not reordering instructions– Control dependencies: determines ordering of an instruction i with respect to a branchCOSC 6385 – Computer ArchitectureEdgar GabrielDynamic scheduling•


View Full Document

UH COSC 6385 - Pipelining (II)

Download Pipelining (II)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining (II) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining (II) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?