Slide 1Pipelining a blockInelastic PipelineWhen is this rule enabled?Pipelining a blockFolded pipelineSuperfolded pipeline One Bfly-4 caseSuperfolded pipeline: stage function fCode for the Superfolded pipeline stage functionFolded pipeline: stage function fSlide 11Slide 12Why are the areas so similarElastic pipeline Use FIFOs instead of pipeline registersWhat behavior do we want?One-Element FIFOConcurrency when the FIFOs do not permit concurrent enq and deqInelastic vs Elastic PipelinesThe tensionLanguage notesSlide 21Syntax: Vector of RegistersMaking guards explicitImplicit guards (conditions)Guards vs If’sStatic vs dynamic expressionsGeneralization: n-stage pipelineSlide 28Simple Inelastic and Folded PipelinesArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyFebruary 14, 2011 L04-1http://csg.csail.mit.edu/6.375Pipelining a blockinQ outQf2f1f3CombinationalCinQ outQf2f1f3PipelinePinQ outQfFolded PipelineFPClock?Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < PFebruary 14, 2011L04-2http://csg.csail.mit.edu/6.375Inelastic PipelinexsReg1inQf1 f2 f3sReg2 outQrule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f1(inQ.first()); inQ.deq();end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f2(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f3(sx2)); endcaseendruleBoth Registers hold values of Maybe typeFebruary 14, 2011L04-3http://csg.csail.mit.edu/6.375When is this rule enabled?rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcaseendruleinQ sReg1 sReg2 outQNE V V NFNE V V FNE V I NFNE V I FNE I V NFNE I V FNE I I NFNE I I FE V V NFE V V FE V I NFE V I FE I V NFE I V FE I I NFE I I FyesNoYesYesYesNoYesyesyesNoYesYesYesNoYes1yesYes1 = yes but no changeinQ sReg1 sReg2 outQFebruary 14, 2011L04-4http://csg.csail.mit.edu/6.375Pipelining a blockinQ outQf2f1f3CombinationalCinQ outQf2f1f3PipelinePinQ outQfFolded PipelineFPClock?Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < PFebruary 14, 2011L04-5http://csg.csail.mit.edu/6.375Folded pipelinerule folded-pipeline (True); if (stage==0) begin sxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1;endrulexsReginQfoutQstagenotice stage is a dynamic parameter now!no for-loopNeed type declarations for sxIn and sxOutFebruary 14, 2011L04-6http://csg.csail.mit.edu/6.375Superfolded pipeline One Bfly-4 casef will be invoked for 48 dynamic values of stageeach invocation will modify 4 numbers in sRegafter 16 invocations a permutation would be done on the whole sRegFebruary 14, 2011L04-7http://csg.csail.mit.edu/6.375Superfolded pipeline: stage function ffunction Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Bit#(2) stage Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; endreturn(stage_out);Bit#(2+4) (stage,i)should be done only when i=15 February 14, 2011L04-8http://csg.csail.mit.edu/6.375Code for the Superfolded pipeline stage functionFunction Vector#(64, Complex) f (Bit#(6) stagei, Vector#(64, Complex) stage_in); let i = stagei `mod` 16; let twid = getTwiddle(stagei `div` 16, i); let y = bfly4(twid, stage_in[i:i+3]); let stage_temp = stage_in; stage_temp[i] = y[0]; stage_temp[i+1] = y[1]; stage_temp[i+2] = y[2]; stage_temp[i+3] = y[3]; let stage_out = stage_temp; if (i == 15) for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); endfunctionOne Bfly-4 caseFebruary 14, 2011L04-9http://csg.csail.mit.edu/6.375Folded pipeline: stage function fThe Twiddle constants can be expressed in a table or in a case or nested case expressionstage getTwiddle0 getTwiddle1getTwiddle2twidThe rest of stage_f, i.e. Bfly-4s and permutations (shared)The rest of stage_f, i.e. Bfly-4s and permutations (shared)sxFebruary 14, 2011L04-10http://csg.csail.mit.edu/6.375802.11a Transmitter [MEMOCODE 2006] Dave, Gerding, Pellauer, ArvindDesign Lines of RelativeBlock Code (BSV) AreaController 49 0%Scrambler 40 0%Conv. Encoder 113 0%Interleaver 76 1%Mapper 112 11%IFFT 95 85%Cyc. Extender 23 3%Complex arithmetic libraries constitute another 200 lines of codeFebruary 14, 2011L04-11http://csg.csail.mit.edu/6.375802.11a Transmitter Synthesis results (Only the IFFT block is changing)IFFT Design Area (mm2)Throughput Latency(CLKs/sym)Min. Freq RequiredPipelined 5.25 04 1.0 MHzCombinational 4.91 04 1.0 MHzFolded(16 Bfly-4s)3.97 04 1.0 MHzSuper-Folded(8 Bfly-4s)3.69 06 1.5 MHzSF(4 Bfly-4s) 2.45 12 3.0 MHzSF(2 Bfly-4s) 1.84 24 6.0 MHzSF (1 Bfly4) 1.52 48 12 MHZTSMC .18 micron; numbers reported are before place and route.The same source codeAll these designs were done in less than 24 hours!February 14, 2011L04-12http://csg.csail.mit.edu/6.375Why are the areas so similarFolding should have given a 3x improvement in IFFT areaBUT a constant twiddle allows low-level optimization on a Bfly-4 blocka 2.5x area reduction!February 14, 2011L04-13http://csg.csail.mit.edu/6.375Elastic pipelineUse FIFOs instead of pipeline registersxfifo1inQf1 f2 f3fifo2outQrule stage1 (True); fifo1.enq(f1(inQ.first()); inQ.deq(); endrulerule stage2 (True);fifo2.enq(f2(fifo1.first()); fifo1.deq(); endrulerule stage3 (True); outQ.enq(f3(fifo2.first()); fifo2.deq(); endruleFiring conditions? Can tokens be left inside the pipeline?Easier to write?No Maybe types?Can all three rules fire concurrently?February 14, 2011L04-14http://csg.csail.mit.edu/6.375What behavior do we want?xfifo1inQf1 f2 f3fifo2outQMaximize concurrency - Fire maximum number of rulesinQ fifo1 fifo2
View Full Document