MSU ECE 4743 - Increasing the Initiation Rate - D1958633

Home> Schools> Mississippi State University> (ECE) > ECE 4743> Increasing the Initiation Rate

MSU ECE 4743 - Increasing the Initiation Rate

School name Mississippi State University

Pages 21

Download Save

Unformatted text preview:

Slide Number 1Increasing the Initiation RateSchedule with Sample Period = 4Initiation Rate = 2ResourcesInitiation Rate, LatencyGeneralized ScheduleSchedule Clk I+0Schedule Clk I+1Schedule: Clks I+2, I+3Resource ComparisonExecution Unit Utilization tableInit Rate = 1Resource Comparison AgainLess Clock cycles, Lower Clock PeriodRecall what Pipelining of Multiplier doesAssume Multiplier is pipelined by 2 stagesPipelined Execution Units Increasing Initiation Rate to 2Increasing Initiation Rate to 2Schedule, Init Rate =2,Department of Electrical and Computer EngineeringMississippi State UniversitySherif Abdelwahed Increasing the Initiation RateComputer Aided Digital Systems Design - EE 4743/6743Increasing the Initiation RateX*a0*X@1 a1*X@2 a2*X@3 a3+++YN2 N3N4N5N6N7N84 ClocksThe flowgraph below has a longest path of 4 clocks. This means the computations cannot be completed in less than 4 clocks because of data dependencies. However, we CANincrease the initiation rate (rate at which new input values are accepted!!)Schedule with Sample Period = 4Cycle Start Adder Multiplier A Multiplier B IO #1 Idle x@3 * a3 (N5) x@2 * a2 (N4) Input X #2 N7 op (N5 + N4) x * a0 (N2) x@1 * a1 (N3) Idle#3 N6 op (N3 + N7) Idle Idle Idle#4 N8 op (N2 + N6) Idle Idle IdleUtilization 75% 50% 50% 25%Initiation Rate = 2Lets look at the operations needed with initiation rate = 2 for several clock cycles. Successive Sample values are labeled A,B,C etc.Clk OperationsSample A Sample B Sample C1 N4(*), N5(*), Input X2 N2(*), N3(*), N7(+) 3 N6 (+) N4(*), N5(*), Input X4 N8(+) N2(*),N3(*),N7(+)5 N6(+) N4(*) N5(*), Input X6 N8(+) N2(*),N3(*),N7(+)7N6(+)8N8(+)ResourcesTwo multiplies per clock, so need two multipliers (A, B).In clock #4, Clock #6 we have two additions, so need two adders (A, B).Clk OperationsSample A: Sample B Sample C1 N4(*) N5(*), Input X2 N2(*),N3(*),N7(+) 3 N6(+) N4(*) N5(*), Input X4 N8(+) N2(*),N3(*),N7(+)5 N6(+) N4(*) N5(*), Input X6 N8(+) N2(*),N3(*),N7(+)7N6(+)8N8(+)Initiation Rate, LatencyThe initiation rate of this design is 2.The latency is 4. When initiation rate ≠ latency, then pipelining is being done because the computations for more than one input sample are being done.Pipelining implies parallelism - more than one sample computation is in progress at any given clock cycle.To schedule, need to generalize the table.Generalized ScheduleNote: The initiation rate must be evenly divisible into the latency in order to generalize the table.ClkOperationsSample J-1: Sample J Sample J+1I-2 N4(*) N5(*), Input XI-1 N2(*),N3(*),N7(+) I N6(+) N4(*) N5(*), Input XI+1 N8(+) N2(*),N3(*),N7(+)I+2 N6(+) N4(*) N5(*), Input XI+3 N8(+) N2(*),N3(*),N7(+)How many samples are needed in a generalized scheduleSchedule Clk I+0What do we need in Registers at Clock I?For Sample J-1: N2 (x * a0) , N3 (x@1 * a1) , N7 (N5 + N4)For Sample J: x@3, x@2, x@1For Sample J+1: No operations.Registers: RA: x@3, RB: x@2, RC: x@1, RD: N2, RE: N3, RF:N7Schedule Clk I+0:Sample J-1: N6(N3+N7) RF ← RE + RF overwrite N7 value, don’t need. Sample J: Input X RE ← X overwrite N3 value, don’t needSample J: N4(x@2*a2) RG ← RB * a2 add new register RG to hold N4Sample J: N5(x@3*a3) RA ← RA * a1 overwrite x@3 value, don’t need.Finished: Added extra Register RG.Schedule Clk I+1Registers: RA: N5, RB: x@2, RC: x@1, RD: N2, RE: X, RF:N6, RG: N4 After Clock, Registers need to be setup for next clock which is:RA: x@3, RB: x@2, RC: x@1, RD: N2, RE: N3, RF:N7Schedule Clk I+1:Sample J-1: N8(N2+N6) Y ← RD + RF output goes to Y bus. Sample J: N2(x*a0) RD ← RE *a0 overwrite old N2 value, don’t needVery important that N2 go into RD because this is needed for next clock cycle.Sample J: N3(x@1*a1) RE ← RC * a1 Need RE=N3 for next clock!!But what about X value that is in RE??? Next clock, X= X@1 for sample J+1, so put X into RC register!!!!Sample J+1: RC=x@1 RC ← RE J+1:x@1 = J:xSample J: N7(N4+N5) RF ← RG + RA Need N7 in RF for J+1 sample.Sample J+1: RA=x@3 RA ← RB J+1: x@3 = J:x@2Sample J+1: RB=x@2 RB ← RC J+1: x@2 = J:x@1 Finished: no extra registers needed.Schedule: Clks I+2, I+3Schedule for Clk I+2 is repeat of Clk I+0!!!Schedule for Clk I+3 is repeat of Clk I+1!!Actually, generalized schedule only needs two clocks!Resource ComparisonDoubling the initiation rate did NOT double the hardware resources needed. Why?Because Execution units for InitRate = 4 were not fully utilized!Init RateResourcesMultipliers Adders Registers42 11022211Execution Unit Utilization tableNote that multipliers were not fully utilized for InitRate = 2,we used this free time in the schedule for InitRate = 4.Init RateExecution Unit UtilizationMult A Mult B Add A Add B4 50% (2/4) 50 (2/4) 75% (3/4) N/A2 100% (2/2) 100% (2/2) 100% (2/2) 50% (1/2)Init Rate = 1Four multiplies in clock 4, so need four multipliers (A, B, C, D).Three additions in clock 4, so need three adders (A, B, C).Clk OperationsSample A: Sample B Sample C Sample D1 N4(*) N5(*), Input X2 N2(*),N3(*),N7(+) N4(*) N5(*), Input X3 N6(+) N2(*),N3(*),N7(+)N4(*) N5(*), Input X4 N8(+) N6(+) N2(*),N3(*),N7(+) N4(*) N5(*), Input X5 N8(+) N6(+) N2(*),N3(*),N7(+)6N8(+)N6(+)7N8(+)Resource Comparison AgainLatency for all of these designs is 4 clocks.This table clearly illustrates the time versus area tradeoff in Digital Systems.Will cost you MORE resources to do something in LESS Time!Init RateResourcesMultipliers Adders Registers42 1 1022 2 1114 3 ??Less Clock cycles, Lower Clock Period Computation Time = #of Clocks * Clock Period  Increasing the initiation rate will increase the computation rate in terms of clock cycles¾ Less clock cycles between new outputs To decrease the clock period (increase clock frequency), need to have shorter combinational paths in the design¾ PIPELINE the individual execution units!!!!¾ Multiplier will have much longer delay than adder, so will want to pipeline this firstRecall what Pipelining of Multiplier doesMULTIPLIERNo pipelining, output is ready after combinational delay.MULTIPLIERDFFREGREGREGREGone stageMULTIPLIERDFFREGREGtwo stagesDFFMULTIPLIERDFFREGREGDFFDFFthree stagesAssume Multiplier is pipelined by 2 stagesDo a solution with Initiation Rate = Latency; 2 mult, 1 adderScheduling now takes 6 clocks. N7 depends on N5, N4

View Full Document


School:
Email:
New Password:
Confirm Password:

MSU ECE 4743 - Increasing the Initiation Rate

Sign up for free to view:

Please select your school