1Elastic Pipelines: Concurrency IssuesArvind Computer Science & Artificial Intelligence LabhfhlMassachusetts Institute of TechnologyFebruary 28, 2011 L08-1http://csg.csail.mit.edu/6.375Inelastic vs Elastic PipelinesIn a Inelastic pipeline:pp typically only one rule; the designer controls precisely which activities go on in parallel downside: The rule can get too complicated -- easy to make a mistake; difficult to make changesIn an Elastic pipeline:pp several smaller rules, each easy to write, easier to make changes downside: sometimes rules do not fire concurrently when they shouldFebruary 28, 2011 L08-2http://csg.csail.mit.edu/6.3752Processor Pipelines and FIFOsfetchexecuteiMemrfCPUdecodememorypcwrite-backdMemCPUIt is better to think in terms of FIFOs as opposed to pipeline registers. February 28, 2011 L08-3http://csg.csail.mit.edu/6.375Fetch & Decode Rule: correctedpcrfCPUfetch & decodeexecuteburule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb}bu enq (EAdd{dst:rd op1:rf[ra] op2:rf[rb]});&&& !bu.find(ra) &&& !bu.find(rb))bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]});pc <= predIa;endruleFebruary 28, 2011 L08-4http://csg.csail.mit.edu/6.3753SFIFO (glue between stages)interface SFIFO#(type t, type tr);method Action enq(t); // enqueue an itemmethodAction deq();// remove oldest entrymethodAction deq();// remove oldest entrymethod t first(); // inspect oldest itemmethod Action clear(); // make FIFO emptymethod Bool find(tr); // search FIFOendinterfacen = # of bits needednot fullrdyenabnenabenqqOleto represent thevalues of type “t“m = # of bits neededto represent thevalues of type “tr"not emptynot emptynrdyrdydeqfirstSFIFOmoduclearenabfindmboolmore on searchable FIFOs later February 28, 2011 L08-5http://csg.csail.mit.edu/6.375Two-Stage Pipeline pcrfCPUfetch & decodeexecutebumodule mkCPU#(Mem iMem, Mem dMem)(Empty);Reg#(Iaddress) pc <- mkReg(0);RegFile#(RName, Bit#(32)) rf <- mkRegFileFull();SFIFO#(InstTemplate, RName) bu <-mkSFifo(findf);Instr instr = iMem.read(pc);Iaddress predIa = pc + 1;InstTemplate it = bu.first();rule fetch_decode ...endmodule February 28, 2011 L08-6http://csg.csail.mit.edu/6.3754Rules for Add pcrfCPUrule decodeAdd(instr matches Add{dst:.rd,src1:.ra,src2:.rb})bu.enq (EAdd{dst:rd,op1:rf[ra],op2:rf[rb]});pc <=predIa;implicit check:fetch & decodeexecutebupc < predIa;endrulerule executeAdd(it matches EAdd{dst:.rd,op1:.va,op2:.vb})rf.upd(rd, va + vb);bu.deq();endruleimplicit check:implicit check:February 28, 2011 L08-7http://csg.csail.mit.edu/6.375Fetch & Decode Rule: ReexaminedpcrfCPUfetch & decodeexecuteburule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb})bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]});qpppc <= predIa;endruleFebruary 28, 2011 L08-8http://csg.csail.mit.edu/6.3755Rules for Branch pcrfCPUrule-atomicity ensures thatpc update, anddiscard of prerule decodeBz(instr matches Bz{condR:.rc,addrR:.addr}) &&&!bu.find(rc) &&& !bu.find(addr));bu.enq (EBz{cond:rf[rc],tAddr:rf[addr]});pc <= predIa;fetch & decodeexecutebudiscard of pre-fetched instrs in bu, are doneconsistentlyppendrulerule bzTaken(it matches EBz{cond:.vc,tAddr:.va}) &&& (vc==0));pc <= va; bu.clear(); endrulerule bzNotTaken (it matches EBz{cond:.vc,tAddr:.va}) &&&(vc != 0));bu.deq; endruleFebruary 28, 2011 L08-9http://csg.csail.mit.edu/6.375Fetch & Decode Rulerule fetch_and_decode (!stallFunc(instr, bu)); bu.enq(newIt(instr));<dIfunction InstrTemplate newIt(Instr instr);case (instr) matchestagged Add {dst:.rd,src1:.ra,src2:.rb}:return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]};tagged Bz {condR:.rc,addrR:.addr}:return EBz{cond:rf[rc],tAddr:rf[addr]};pc <= predIa;endrulebeforereturn EBz{cond:rf[rc],tAddr:rf[addr]};tagged Load {dst:.rd,addrR:.addr}:return ELoad{dst:rd,addrR:rf[addr]};tagged Store{valueR:.v,addrR:.addr}:return EStore{val:rf[v],addr:rf[addr]};endcase endfunction Same as bFebruary 28, 2011 L08-10http://csg.csail.mit.edu/6.3756The Stall SignalBool stall = stallFunc(instr, bu);function Bool stallFunc (Instr instr, SFIFO#(InstTemplate, RName) bu); case (instr) matchestagged Add {dst:.rd,src1:.ra,src2:.rb}: return (bu.find(ra) || bu.find(rb));tagged Bz {condR:.rc,addrR:.addr}: return (bu.find(rc) || bu.find(addr));tagged Load {dst:.rd,addrR:.addr}: return(bu find(addr));This need to search the contents of the FIFO is why we need an SFIFO, not just a FIFOreturn(bu.find(addr));tagged Store {valueR:.v,addrR:.addr}: return (bu.find(v)) || bu.find(addr));endcaseendfunctionFebruary 28, 2011 L08-11http://csg.csail.mit.edu/6.375The findf functionWhen we make a searchable FIFO we need to supply a function that determines if a register is going to be updated by an instruction templateupdated by an instruction templatemkSFifo can be parameterized by such a search functionSFIFO#(InstrTemplate, RName) bu <- mkSFifo(findf);function Bool findf (RName r, InstrTemplate it); case (it) matchestagged EAdd{dst:.rd,op1:.v1,op2:.v2}: return (r==rd);orereturn (r rd); tagged EBz {cond:.c,tAddr:.a}: return (False);tagged ELoad{dst:.rd,addr:.a}: return (r == rd);tagged EStore{val:.v,addr:.a}: return (False);endcase endfunctionSame as befoFebruary 28, 2011 L08-12http://csg.csail.mit.edu/6.3757Execute Rulerule execute (True);case (it) matchestaggedEAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,tAddr:.av}:if (cv == 0) then begin pc <= av; bu.clear(); end else bu.deq();tagged ELoad{dst:.rd,addr:.av}: beginrf upd(rddMem read(av)); bu deq();endbeginrf.upd(rd, dMem.read(av)); bu.deq(); endtagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); endendcaseendruleFebruary 28, 2011 L08-13http://csg.csail.mit.edu/6.375Concurrencyrule fetch_and_decode (!stallFunc(instr, bu)); bu.enq(newIt(instr,rf));pc < predIafetch & decodeexecutepcrfCPUbupc <= predIa;endrulerule execute (True);case (it) matchestagged EAdd{dst:.rd,op1:.va,op2:.vb}: beginrf.upd(rd, va+vb); bu.deq(); endtagged EBz {cond:.cv,tAddr:.av}:if (cv == 0) then beginCan these rules fire concurrently ?Does it matter?gpc <= av; bu.clear(); end else bu.deq();tagged ELoad{dst:.rd,addr:.av}: beginrf.upd(rd, dMem.read(av)); bu.deq(); endtagged EStore{val:.vv,addr:.av}: begindMem.write(av, vv); bu.deq(); endendcase endruleFebruary 28, 2011 L08-14http://csg.csail.mit.edu/6.3758The tensionIf the two rules never fire in the same cycle then the machine can hardly be called a pipelined machinepipelined machine Scheduling cannot be too conservativeIf both
View Full Document