MIT 6 375 - Performance Specifications - D2560957

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 375> Performance Specifications

DOC PREVIEW

MIT 6 375 - Performance Specifications

School name Massachusetts Institute of Technology

Course 6 375- Complex Digital Systems

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1March 5, 2008 L11-1http://csg.csail.mit.edu/6.375Performance SpecificationsArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyMarch 5, 2008 L11-2http://csg.csail.mit.edu/6.375Simple processor pipelineRFiMemdMemWbIFbFExebEMembWDecbDFunctional behavior is well understoodIntuition about performance is lacking Should the branch be resolved in the Decode or Execute stage? Should the branch target address be latched before its use?Experimentation is required to evaluate design alternativescycle time? area? execution time?Bz? Bz? We present a design flow that makes such experimentation easy for the designer2March 5, 2008 L11-3http://csg.csail.mit.edu/6.375Need for Performance SpecsRFiMemdMemWbIFbFExebEMembWDecbDF = {Fetch} D = {DecAdd,DecBz,…}E = { ExeAdd,ExeBzTaken,ExeBzNotTaken,…}M = {MemLd, MemSt,MemWB,…}W = {Wb}Rules:•What is the design’s performance / throughput?•Reference model implies one rule per cycle executionDesigner’s goal is usually different and based on the application!March 5, 2008 L11-4http://csg.csail.mit.edu/6.375Pipelining via Performance specificationThe designer wants a pipeline which executes one instruction every cyclePerformance spec for a pipelined processor:RFiMemdMemWbIFbFExebEMembWDecbDI0I1I2I3W < M < E < D < FA cycle in slow motion I4I5wME?3March 5, 2008 L11-5http://csg.csail.mit.edu/6.375More Performance SpecificationF = {Fetch} D = {DecAdd,DecBz,…}E = { ExeAdd,ExeBzTaken,ExeBzNotTaken,…}M = {MemLd, MemSt,MemWB,…}W = {Wb}We allow the designer to specify performance!F < D < E < M < WW < M < E < D < F ≡ pipelinedW < W < M < M < E < E < D < D < F < FSynthesis algorithms ensure that performance specs are satisfied andguarantee that functionality is not altered.1) W < M < E* < D < F2) W < M < ExeBzTakenWhat do the following mean?≡ unpipelined (assuming buffers start empty)≡ two-way superscalar!≡ pipelined except for ExeBzTakenMarch 5, 2008 L11-6http://csg.csail.mit.edu/6.375Why is functionality maintained?A few observations about rule-based systems: Adding a new rule to a system can only introduce new behaviors If the new rule is a derived rule, then it does not add new behaviorsComposed rules: Given rules: The composed rule is a derived rule:Ra: when πa(s) => s := δa(s);Rb: when πb(s) => s := δb(s);Ra,b: when πa(s) & πb(δa(s)) => s := δb(δa(s));4March 5, 2008 L11-7http://csg.csail.mit.edu/6.375Scheduling Specificationsrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf));pc <= predIa;endrulerule execAdd(it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb});rf.upd(rd, va+vb); bu.deq(); endrulerule execBzTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& (cv == 0));w pc <= av; bu.clear(); endrulerule execBzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& !(cv == 0));w bu.deq(); endrulerule execLoad(it matches tagged ELoad{dst:.rd,addr:.av});rf.upd(rd, dMem.read(av)); bu.deq(); endrulerule execStore(it matches tagged EStore{value:.vv,addr:.av});dMem.write(av, vv); bu.deq(); endrulefetch & decodeexecutepcrfCPUbuexecAdd < fetchexecBzTaken < fetch execBzNotTaken < fetch ? execLoad < fetchexecStore < fetchMarch 5, 2008 L11-8http://csg.csail.mit.edu/6.375Implications for modulesrule fetch_and_decode (!stallfunc(instr, bu));bu.enq(newIt(instr,rf));pc <= predIa;endrulefetch & decodeexecutepcrfCPUburule execAdd(it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb});rf.upd(rd, va+vb); bu.deq();endruleexecAdd < fetch ⇒ rf: sub > upd bu: {find, enq} > {first , deq}5March 5, 2008 L11-9http://csg.csail.mit.edu/6.375Branch rulesrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf));pc <= predIa;endrulefetch & decodeexecutepcrfCPUburule execBzTaken(it matches tagged Bz {cond:.cv,addr:.av}&&& (cv == 0));pc <= av; bu.clear(); endrulerule execBzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}&&& !(cv == 0));bu.deq(); endruleexecBzTaken < fetch ? Should be treated as conflict – give priority to execBzTakenexecBzNotTaken < fetchbu: {first , deq} < {find, enq} March 5, 2008 L11-10http://csg.csail.mit.edu/6.375Load-Store Rulesrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf));pc <= predIa;endrulefetch & decodeexecutepcrfCPUburule execStore(it matches tagged EStore{value:.vv,addr:.av});dMem.write(av, vv); bu.deq();endrulerule execLoad(it matches tagged ELoad{dst:.rd,addr:.av});rf.upd(rd, dMem.read(av)); bu.deq(); endruleexecLoad < fetch ? Same as execAdd, i.e., rf: upd < subbu: {first , deq} < {find, enq}execStore < fetch ?bu: {first , deq} < {find, enq}6March 5, 2008 L11-11http://csg.csail.mit.edu/6.375Properties Required of Register File & FIFO to meet performance specsRegister File:  rf.upd < rf.subFIFO  bu: {first , deq} < {find, enq} ⇒ bu.first < bu.find bu.first < bu.enq bu.deq < bu.find bu.deq < bu.enqMarch 5, 2008 L11-12http://csg.csail.mit.edu/6.375The good news ...It is always possible to transform your design to meet desired concurrency and functionality Though critical path and hence the clock period may increase7March 5, 2008 L11-13http://csg.csail.mit.edu/6.375Register Interfacesread < writeDQ01readwrite.xwrite.enwrite < read ?read’read’ – returns the current state when write is not enabledread’ – returns the value being written if write is enabledMarch 5, 2008 L11-14http://csg.csail.mit.edu/6.375Ephemeral History Register (EHR)read0< write0< read1< write1< ….DQ01read1write0.xwrite0.enread001write1.xwrite1.enwritei+1takes precedence over writei[Rosenband MEMOCODE’04]8March 5, 2008 L11-15http://csg.csail.mit.edu/6.375Transformation for Performancerule fetch_and_decode (!stallfunc1(instr, bu)); bu.enq1(newIt(instr,rf));pc <= predIa;endrulerule execAdd(it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb});rf.upd0(rd, va+vb); bu.deq0(); endrulerule execBzTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& (cv == 0));w pc <= av; bu.clear(); endrulerule execBzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& !(cv == 0));w bu.deq0(); endrulerule execLoad(it matches tagged ELoad{dst:.rd,addr:.av});rf.upd0(rd, dMem.read(av)); bu.deq0(); endrulerule execStore(it matches tagged EStore{value:.vv,addr:.av});dMem.write(av, vv); bu.deq0(); endruleexecAdd < fetchexecBzTaken < fetch execLoad < fetchexecStore < fetchMarch 5, 2008 L11-16http://csg.csail.mit.edu/6.375One Element FIFO using EHRsmodule mkFIFO1 (FIFO#(t));EHReg2#(t) data <- mkEHReg2U(); EHReg2#(Bool) full <-

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

MIT 6 375 - Performance Specifications

Sign up for free to view:

Please select your school