1Simple Inelastic and Folded PipelinesFolded PipelinesArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyFebruary 14, 2011 L04-1http://csg.csail.mit.edu/6.375Pipelining a blockf2f1f3CombinationalCinQ outQCinQ outQf2f1f3PipelinePinQ outQfFolded PipelineFPClock?Area? Throughput?Clock: C < P FPArea: FP < C < P Throughput: FP < C < PFebruary 14, 2011L04-2http://csg.csail.mit.edu/6.3752Inelastic Pipelinef1f2f3Both Registers hold values of xsReg1inQf1f2f3sReg2 outQrule sync-pipeline (True);if (inQ.notEmpty())begin sReg1 <= tagged Valid f1(inQ.first()); inQ.deq();endelse sReg1 <= taggedInvalid;hold values of Maybe typecase (sReg1) matchestagged Valid .sx1: sReg2 <= tagged Valid f2(sx1);tagged Invalid: sReg2 <= tagged Invalid; endcasecase (sReg2) matches tagged Valid .sx2: outQ.enq(f3(sx2)); endcaseendruleFebruary 14, 2011L04-3http://csg.csail.mit.edu/6.375When is this rule enabled?rule sync-pipeline (True);if (inQ.notEmpty())begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); endelse sReg1 <= tagged Invalid;ggg;case (sReg1) matchestagged Valid .sx1: sReg2 <= tagged Valid f1(sx1);tagged Invalid: sReg2 <= tagged Invalid; endcasecase (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcaseendruleinQ sReg1 sReg2 outQNEVVNFNEVVFEVVNFEVVFyesNoYes inQ sReg1 sReg2 outQNEVINFNEVIFNEIVNFNEIVFNEIINFNEIIFEVINFEVIFEIVNFEIVFEIINFEIIFFebruary 14, 2011L04-4http://csg.csail.mit.edu/6.3753Pipelining a blockf2f1f3CombinationalCinQ outQCinQ outQf2f1f3PipelinePinQ outQfFolded PipelineFPClock?Area? Throughput?Clock: C < P FPArea: FP < C < P Throughput: FP < C < PFebruary 14, 2011L04-5http://csg.csail.mit.edu/6.375Folded pipelinefrule folded-pipeline (True);if (stage==0) begin sxIn= inQ.first(); inQ.deq(); endlIRxsReginQoutQstage else sxIn= sReg; sxOut = f(stage,sxIn);if (stage==n-1) outQ.enq(sxOut);else sReg <= sxOut;stage <= (stage==n-1)? 0 : stage+1;endrulenotice stageis a dynamic parameter now!no for-loopNeed type declarations for sxIn and sxOutFebruary 14, 2011L04-6http://csg.csail.mit.edu/6.3754Superfolded pipelineOne Bfly-4 casef will be invoked for 48 dynamic f will be invoked for 48 dynamic values of stage each invocation will modify 4 numbers in sReg after 16 invocations a permutation would be done on the whole sRegwould be done on the whole sRegFebruary 14, 2011L04-7http://csg.csail.mit.edu/6.375Superfolded pipeline: stage function ffunction Vector#(64, Complex) stage_f(Bit#(2) stage, Vector#(64, Complex) stage_in);beginfor (Integer i = 0; i < 16; i = i + 1)begin Bit#(2) stageInteger idx = i * 4;let twid = getTwiddle(stage, fromInteger(i));let y = bfly4(twid, stage_in[idx:idx+3]);stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1];stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3];endend//Permutationfor (Integer i = 0; i < 64; i = i + 1)stage_out[i] = stage_temp[permute[i]];endreturn(stage_out);February 14, 2011L04-8http://csg.csail.mit.edu/6.3755Code for the Superfolded pipeline stage functionFunction Vector#(64, Complex) f(Bit#(6) stagei, Vector#(64, Complex) stage_in);leti=stagei`mod`16;leti= stageimod 16;let twid = getTwiddle(stagei `div` 16, i);let y = bfly4(twid, stage_in[i:i+3]);let stage_temp = stage_in;stage_temp[i] = y[0];stage_temp[i+1] = y[1];stage_temp[i+2] = y[2];stage temp[i+3] = y[3];One Bfly-4 caseg_ p[]y[];let stage_out = stage_temp;if (i == 15)for (Integer i = 0; i < 64; i = i + 1)stage_out[i] = stage_temp[permute[i]];return(stage_out); endfunctionFebruary 14, 2011L04-9http://csg.csail.mit.edu/6.375Folded pipeline: stage function fgetTwiddle0Th t f stagegetTwiddle0 getTwiddle1getTwiddle2twid(shared)The rest of stage_f, i.e. Bfly-4s and permutations (shared)sThe Twiddle constants can be expressed in a table or in a case or nested case expressionsxFebruary 14, 2011L04-10http://csg.csail.mit.edu/6.3756802.11a Transmitter [MEMOCODE 2006] Dave, Gerding, Pellauer, ArvindDesign Lines of RelativeBlock Code (BSV)AreaBlock Code (BSV)AreaController 49 0%Scrambler 40 0%Conv. Encoder 113 0%Interleaver 76 1%Mapper 112 11%Mapper 112 11%IFFT 95 85%Cyc. Extender 23 3%Complex arithmetic libraries constitute another 200 lines of codeFebruary 14, 2011L04-11http://csg.csail.mit.edu/6.375802.11a Transmitter Synthesis results (Only the IFFT block is changing)IFFT Design Area (mm2)Throughput Latency(CLKs/sym)Min. Freq Required(CLKs/sym)Pipelined 5.25 04 1.0 MHzCombinational 4.91 04 1.0 MHzFolded(16 Bfly-4s)3.97 04 1.0 MHzSuper-Folded 3.69 06 1.5 MHzThe All these designs were done in less than 24 hours!(8 Bfly-4s)SF(4 Bfly-4s) 2.45 12 3.0 MHzSF(2 Bfly-4s) 1.84 24 6.0 MHzSF (1 Bfly4) 1.52 48 12 MHZTSMC .18 micron; numbers reported are before place and route.same source codeFebruary 14, 2011L04-12http://csg.csail.mit.edu/6.3757Why are the areas so similarFolding should have given a 3x Folding should have given a 3x improvement in IFFT areaFebruary 14, 2011L04-13http://csg.csail.mit.edu/6.375Elastic pipelineUse FIFOs instead of pipeline registersf1f2f3xfifo1inQf1f2f3fifo2 outQrule stage1 (True); fifo1.enq(f1(inQ.first()); inQ.deq(); endrulerulestage2 (True);Firing conditions? Can tokens be left inside the pipeline?rulestage2 (True);fifo2.enq(f2(fifo1.first()); fifo1.deq(); endrulerule stage3 (True); outQ.enq(f3(fifo2.first()); fifo2.deq(); endruleEasier to write?No Maybe types?Can all three rules fire concurrently?February 14, 2011L04-14http://csg.csail.mit.edu/6.3758What behavior do we want?f1f2f3xfifo1inQf1f2f3fifo2 outQinQ fifo1 fifo2 outQNE NE,NF NE,NF NFNE NE,NF NE,NF FNENE NFNE FNFYes Yes YesYes Yes Norule1 rule2 rule3NENE,NFNE,FNFNE NE,NF NE,F F….February 14, 2011L04-15http://csg.csail.mit.edu/6.375module mkFIFO1 (FIFO#(t));Reg#(t) data <- mkRegU(); #()()One-Element FIFOReg#(Bool) full <-mkReg(False);method Action enq(t x) if (!full);full <= True; data <= x;endmethodmethod Action deq() if (full);full <= False;endmethodthdtfi t()if(fll)nmethod t first() if (full);return (data);endmethodmethod Action clear();full <= False;endmethodendmodulenot emptynot fullrdyenabrdyenabenqdeqFIFOmoduleJanuary 18, 2011L4-16http://csg.csail.mit.edu/SNU9Concurrency when the FIFOs do not permit concurrent enq and deqf1f2f3xfifo1inQf1f2f3fifo2 outQnot emptynot empty¬ fullnot empty¬ fullnot fullJanuary 18, 2011L4-17http://csg.csail.mit.edu/SNUInelastic vs Elastic PipelinesIn an Inelastic pipeline:pp typically only one rule; the
View Full Document