Slide 1Inelastic vs Elastic PipelinesProcessor Pipelines and FIFOsSFIFO (glue between stages)Two-Stage PipelineRules for AddFetch & Decode Rule: ReexaminedFetch & Decode Rule: correctedRules for BranchFetch & Decode RuleThe Stall SignalThe findf functionExecute RuleConcurrencyThe tensionThe compiler issueExecution rules Split the execution rule for analysisConcurrency analysis Add RuleWhat concurrency do we want?Concurrency analysis Branch RulesConcurrency analysis Load-Store RulesProperties Required of Register File and FIFO for Instruction PipeliningOne Element Searchable Pipeline SFIFORegister File concurrency propertiesBypass Register FileSince our rules do not really require a Bypass Register File, the overhead of bypassing can be avoided by simply using the “Config Regfile”Concurrency analysis Two-stage PipelineLot of nontrivial analysis but no change in processor code!BypassingThe stall function for the elastic pipelineElastic Pipelines: Concurrency IssuesArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyMarch 3, 2010 L09-1http://csg.csail.mit.edu/6.375Inelastic vs Elastic PipelinesIn a Inelastic pipeline:typically only one rule; the designer controls precisely which activities go on in paralleldownside: The rule can get too complicated -- easy to make a mistake; difficult to make changesIn an Elastic pipeline:several smaller rules, each easy to write, easier to make changesdownside: sometimes rules do not fire concurrently when they shouldEasy: cycle-level concurrencyDifficult: precise functional correctness Easy: functional correctness Difficult: precise cycle-level concurrencyMarch 3, 2010 L09-2http://csg.csail.mit.edu/6.375Processor Pipelines and FIFOsfetchexecute iMemrfCPUdecodememorypcwrite-backdMemIt is better to think in terms of FIFOs as opposed to pipeline registers. March 3, 2010 L09-3http://csg.csail.mit.edu/6.375SFIFO (glue between stages)interface SFIFO#(type t, type tr); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Bool find(tr); // search FIFOendinterface n = # of bits needed to represent the values of type “t“ m = # of bits needed to represent the values of type “tr"not fullnot emptynot emptyrdyenabnnrdyenabrdyenqdeqfirstSFIFOmoduleclearenabfindmboolmore on searchable FIFOs later March 3, 2010 L09-4http://csg.csail.mit.edu/6.375Two-Stage Pipeline fetch & decodeexecutepcrfCPUbumodule mkCPU#(Mem iMem, Mem dMem)(Empty); Reg#(Iaddress) pc <- mkReg(0);RegFile#(RName, Bit#(32)) rf <- mkRegFileFull();SFIFO#(InstTemplate, RName) bu <- mkSFifo(findf); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; InstTemplate it = bu.first(); rule fetch_decode ...endmodule March 3, 2010 L09-5http://csg.csail.mit.edu/6.375Rules for Add rule decodeAdd(instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); pc <= predIa;endrulerule executeAdd(it matches EAdd{dst:.rd,op1:.va,op2:.vb}) rf.upd(rd, va + vb); bu.deq();endruleimplicit check:implicit check:fetch & decodeexecutepcrfCPUbubu notfullbu notemptyMarch 3, 2010 L09-6http://csg.csail.mit.edu/6.375Fetch & Decode Rule: Reexamined Wrong! Because instructions in bu may be modifying ra or rbstall !fetch & decodeexecutepcrfCPUburule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa;endruleMarch 3, 2010 L09-7http://csg.csail.mit.edu/6.375Fetch & Decode Rule: correctedfetch & decodeexecutepcrfCPUburule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb} bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa;endrule&&& !bu.find(ra) &&& !bu.find(rb))March 3, 2010 L09-8http://csg.csail.mit.edu/6.375Rules for Branch rule decodeBz(instr matches Bz{condR:.rc,addrR:.addr}) &&& !bu.find(rc) &&& !bu.find(addr)); bu.enq (EBz{cond:rf[rc],tAddr:rf[addr]}); pc <= predIa; endrulerule bzTaken(it matches EBz{cond:.vc,tAddr:.va}) &&& (vc==0)); pc <= va; bu.clear(); endrulerule bzNotTaken (it matches EBz{cond:.vc,tAddr:.va}) &&& (vc != 0)); bu.deq; endrulefetch & decodeexecutepcrfCPUburule-atomicity ensures thatpc update, anddiscard of pre-fetched instrs in bu, are doneconsistentlyMarch 3, 2010 L09-9http://csg.csail.mit.edu/6.375Fetch & Decode Rule function InstrTemplate newIt(Instr instr); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {condR:.rc,addrR:.addr}: return EBz{cond:rf[rc],tAddr:rf[addr]}; tagged Load {dst:.rd,addrR:.addr}: return ELoad{dst:rd,addrR:rf[addr]}; tagged Store{valueR:.v,addrR:.addr}: return EStore{val:rf[v],addr:rf[addr]}; endcase endfunction rule fetch_and_decode (!stallFunc(instr, bu)); bu.enq(newIt(instr)); pc <= predIa;endrule Same as beforeMarch 3, 2010 L09-10http://csg.csail.mit.edu/6.375The Stall SignalBool stall = stallFunc(instr, bu);This need to search the contents of the FIFO is why we need an SFIFO, not just a FIFOfunction Bool stallFunc (Instr instr, SFIFO#(InstTemplate, RName) bu); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (bu.find(ra) || bu.find(rb)); tagged Bz {condR:.rc,addrR:.addr}: return (bu.find(rc) || bu.find(addr)); tagged Load {dst:.rd,addrR:.addr}: return (bu.find(addr)); tagged Store {valueR:.v,addrR:.addr}: return (bu.find(v)) || bu.find(addr)); endcaseendfunctionMarch 3, 2010 L09-11http://csg.csail.mit.edu/6.375The findf functionWhen we make a searchable FIFO we need to supply a function that determines if a register is going to be updated by an instruction templatemkSFifo can be parameterized by such a search functionSFIFO#(InstrTemplate, RName) bu <- mkSFifo(findf);function Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.v1,op2:.v2}: return (r == rd); tagged EBz {cond:.c,tAddr:.a}: return (False); tagged ELoad{dst:.rd,addr:.a}: return (r == rd); tagged EStore{val:.v,addr:.a}: return (False); endcase endfunctionSame as beforeMarch 3, 2010 L09-12http://csg.csail.mit.edu/6.375Execute Rulerule execute (True); case (it) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd,
View Full Document