DOC PREVIEW
MIT 6 375 - Performance Specifications

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1March 5, 2008 L11-1http://csg.csail.mit.edu/6.375Performance SpecificationsArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyMarch 5, 2008 L11-2http://csg.csail.mit.edu/6.375Simple processor pipelineRFiMemdMemWbIFbFExebEMembWDecbDFunctional behavior is well understoodIntuition about performance is lacking Should the branch be resolved in the Decode or Execute stage? Should the branch target address be latched before its use?Experimentation is required to evaluate design alternativescycle time? area? execution time?Bz? Bz? We present a design flow that makes such experimentation easy for the designer2March 5, 2008 L11-3http://csg.csail.mit.edu/6.375Need for Performance SpecsRFiMemdMemWbIFbFExebEMembWDecbDF = {Fetch} D = {DecAdd,DecBz,…}E = { ExeAdd,ExeBzTaken,ExeBzNotTaken,…}M = {MemLd, MemSt,MemWB,…}W = {Wb}Rules:•What is the design’s performance / throughput?•Reference model implies one rule per cycle executionDesigner’s goal is usually different and based on the application!March 5, 2008 L11-4http://csg.csail.mit.edu/6.375Pipelining via Performance specificationThe designer wants a pipeline which executes one instruction every cyclePerformance spec for a pipelined processor:RFiMemdMemWbIFbFExebEMembWDecbDI0I1I2I3W < M < E < D < FA cycle in slow motion I4I5wME?3March 5, 2008 L11-5http://csg.csail.mit.edu/6.375More Performance SpecificationF = {Fetch} D = {DecAdd,DecBz,…}E = { ExeAdd,ExeBzTaken,ExeBzNotTaken,…}M = {MemLd, MemSt,MemWB,…}W = {Wb}We allow the designer to specify performance!F < D < E < M < WW < M < E < D < F ≡ pipelinedW < W < M < M < E < E < D < D < F < FSynthesis algorithms ensure that performance specs are satisfied andguarantee that functionality is not altered.1) W < M < E* < D < F2) W < M < ExeBzTakenWhat do the following mean?≡ unpipelined (assuming buffers start empty)≡ two-way superscalar!≡ pipelined except for ExeBzTakenMarch 5, 2008 L11-6http://csg.csail.mit.edu/6.375Why is functionality maintained?A few observations about rule-based systems: Adding a new rule to a system can only introduce new behaviors If the new rule is a derived rule, then it does not add new behaviorsComposed rules: Given rules: The composed rule is a derived rule:Ra: when πa(s) => s := δa(s);Rb: when πb(s) => s := δb(s);Ra,b: when πa(s) & πb(δa(s)) => s := δb(δa(s));4March 5, 2008 L11-7http://csg.csail.mit.edu/6.375Scheduling Specificationsrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf));pc <= predIa;endrulerule execAdd(it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb});rf.upd(rd, va+vb); bu.deq(); endrulerule execBzTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& (cv == 0));w pc <= av; bu.clear(); endrulerule execBzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& !(cv == 0));w bu.deq(); endrulerule execLoad(it matches tagged ELoad{dst:.rd,addr:.av});rf.upd(rd, dMem.read(av)); bu.deq(); endrulerule execStore(it matches tagged EStore{value:.vv,addr:.av});dMem.write(av, vv); bu.deq(); endrulefetch & decodeexecutepcrfCPUbuexecAdd < fetchexecBzTaken < fetch execBzNotTaken < fetch ? execLoad < fetchexecStore < fetchMarch 5, 2008 L11-8http://csg.csail.mit.edu/6.375Implications for modulesrule fetch_and_decode (!stallfunc(instr, bu));bu.enq(newIt(instr,rf));pc <= predIa;endrulefetch & decodeexecutepcrfCPUburule execAdd(it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb});rf.upd(rd, va+vb); bu.deq();endruleexecAdd < fetch ⇒ rf: sub > upd bu: {find, enq} > {first , deq}5March 5, 2008 L11-9http://csg.csail.mit.edu/6.375Branch rulesrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf));pc <= predIa;endrulefetch & decodeexecutepcrfCPUburule execBzTaken(it matches tagged Bz {cond:.cv,addr:.av}&&& (cv == 0));pc <= av; bu.clear(); endrulerule execBzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}&&& !(cv == 0));bu.deq(); endruleexecBzTaken < fetch ? Should be treated as conflict – give priority to execBzTakenexecBzNotTaken < fetchbu: {first , deq} < {find, enq} March 5, 2008 L11-10http://csg.csail.mit.edu/6.375Load-Store Rulesrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf));pc <= predIa;endrulefetch & decodeexecutepcrfCPUburule execStore(it matches tagged EStore{value:.vv,addr:.av});dMem.write(av, vv); bu.deq();endrulerule execLoad(it matches tagged ELoad{dst:.rd,addr:.av});rf.upd(rd, dMem.read(av)); bu.deq(); endruleexecLoad < fetch ? Same as execAdd, i.e., rf: upd < subbu: {first , deq} < {find, enq}execStore < fetch ?bu: {first , deq} < {find, enq}6March 5, 2008 L11-11http://csg.csail.mit.edu/6.375Properties Required of Register File & FIFO to meet performance specsRegister File:  rf.upd < rf.subFIFO  bu: {first , deq} < {find, enq} ⇒ bu.first < bu.find bu.first < bu.enq bu.deq < bu.find bu.deq < bu.enqMarch 5, 2008 L11-12http://csg.csail.mit.edu/6.375The good news ...It is always possible to transform your design to meet desired concurrency and functionality Though critical path and hence the clock period may increase7March 5, 2008 L11-13http://csg.csail.mit.edu/6.375Register Interfacesread < writeDQ01readwrite.xwrite.enwrite < read ?read’read’ – returns the current state when write is not enabledread’ – returns the value being written if write is enabledMarch 5, 2008 L11-14http://csg.csail.mit.edu/6.375Ephemeral History Register (EHR)read0< write0< read1< write1< ….DQ01read1write0.xwrite0.enread001write1.xwrite1.enwritei+1takes precedence over writei[Rosenband MEMOCODE’04]8March 5, 2008 L11-15http://csg.csail.mit.edu/6.375Transformation for Performancerule fetch_and_decode (!stallfunc1(instr, bu)); bu.enq1(newIt(instr,rf));pc <= predIa;endrulerule execAdd(it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb});rf.upd0(rd, va+vb); bu.deq0(); endrulerule execBzTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& (cv == 0));w pc <= av; bu.clear(); endrulerule execBzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}w &&& !(cv == 0));w bu.deq0(); endrulerule execLoad(it matches tagged ELoad{dst:.rd,addr:.av});rf.upd0(rd, dMem.read(av)); bu.deq0(); endrulerule execStore(it matches tagged EStore{value:.vv,addr:.av});dMem.write(av, vv); bu.deq0(); endruleexecAdd < fetchexecBzTaken < fetch execLoad < fetchexecStore < fetchMarch 5, 2008 L11-16http://csg.csail.mit.edu/6.375One Element FIFO using EHRsmodule mkFIFO1 (FIFO#(t));EHReg2#(t) data <- mkEHReg2U(); EHReg2#(Bool) full <-


View Full Document

MIT 6 375 - Performance Specifications

Documents in this Course
IP Lookup

IP Lookup

15 pages

Verilog 1

Verilog 1

19 pages

Verilog 2

Verilog 2

23 pages

Encoding

Encoding

21 pages

Quiz

Quiz

10 pages

IP Lookup

IP Lookup

30 pages

Load more
Download Performance Specifications
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance Specifications and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance Specifications 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?