DOC PREVIEW
MIT 6 375 - Bluespec-4: Rule Scheduling and Synthesis

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

March 6, 2006 http://csg.csail.mit.edu/6.375/ L10-1Bluespec-4: Rule Scheduling and SynthesisArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyMarch 6, 2006 http://csg.csail.mit.edu/6.375/ L10-2Exploring microarchitecturesIP Lookup ModuleMarch 6, 2006 L10-3http://csg.csail.mit.edu/6.375/IP Lookup block in a routerQueueManagerPacket ProcessorExit functionsControlProcessorLine Card (LC)IP LookupSRAM(lookup table)ArbitrationSwitchLCLCLCA packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing tableLine rate and the order of arrival must be maintainedline rate ⇒15Mpps for 10GEMarch 6, 2006 L10-4http://csg.csail.mit.edu/6.375/1823M RefResultIP addressE5.13.7.2C10.18.200.77.14.7.2F10.18.201.5F7.13.7.3Sparse tree representation3A…A…BC…C…5DF…F…14A…A…7F…F…200F…F…F*E5.*.*.*D10.18.200.5C10.18.200.*B7.14.7.3A7.14.*.*F…F…FF…E57102550144AReal-world lookup algorithms are more complex but all make a sequence of dependent memory references.March 6, 2006 L10-5http://csg.csail.mit.edu/6.375/Table representation issuesTable size Depends on the number of entries: 10K to 100K Too big to fit on chip memory Î SRAM Î DRAM Îlatency, cost, power issuesNumber of memory accesses for an LPM? Too many Î difficult to do table lookup at line rate (say at 10Gbps)Control-plane issues: incremental table update size, speed of table maintenance softwareIn this lecture (to fit the code on slides!): Level 1: 16 bits, Level 2: 8 bits, Level 3: 8 bits ⇒ from 1 to 3 memory accesses for an LPMMarch 6, 2006 L10-6http://csg.csail.mit.edu/6.375/“C” version of LPMintlpm (IPA ipa) /* 3 memory lookups */{ int p;/* Level 1: 16 bits */p = RAM [ipa[31:16]]; if (isLeaf(p)) return p;/* Level 2: 8 bits */p = RAM [p + ipa [15:8]]; if (isLeaf(p)) return p;/* Level 3: 8 bits */p = RAM [p + ipa [7:0]]; return p; /* must be a leaf */}How to implement LPM in HW?Not obvious from the C code!…216 -10……28 -10…28 -10Must process a packet every 1/15 μs or 67 nsMust sustain 3 memory dependent lookups in 67 nsMarch 6, 2006 L10-7http://csg.csail.mit.edu/6.375/Static PipelineAssume the memory has a latency of n cycles and can accept a request every cycleInefficient memory usage – unused memory slots represent wasted bandwidth.Difficult to schedule table updatesRAMreqIP addrrespRAM latency=4March 6, 2006 L10-8http://csg.csail.mit.edu/6.375/Circular pipelineluReqluRespCompletion buffer- gives out tokens to control the entry into thecircular pipeline- ensures that departures take place in order even if lookups complete out-of-orderenter?enter?done?done?RAMcbufyesgetTokeninactivenoMarch 6, 2006 L10-9http://csg.csail.mit.edu/6.375/RAMs: Synchronous vsAsynchronous viewBasic memory components are "synchronous": Present a read-address AJon clock J Data DJarrives on clock J+N If you don't "catch" DJon clock J+N, it may be lost, i.e., data DJ+1may arrive on clock J+1+NThis kind of synchronicity can pervade the design and cause complicationsSynch Mem,Latency NAddr DataClockMarch 6, 2006 L10-10http://csg.csail.mit.edu/6.375/Asynchronous RAMsIt's easier to work with an "asynchronous" blockSynch MemLatency NAddrReadyctr(ctr > 0)ctr++ctr--deqEnableenqinterface AsyncRAM#(type addr_T, type data_T);w method Action req(addr_T a);w method ActionValue#(data_T) resp();endinterfaceDataAckDataReadyreqrespMarch 6, 2006 L10-11http://csg.csail.mit.edu/6.375/Static coderule static (True);if (c5 == 3) beginIP ip = in.first(); ram.req(ip[31:16]); r1 <= ip[15:0];in.deq(); c1 <= 1; endelse beginr1 <= r5; c1 <= c5+1;ram.req(r5);endr2 <= r1; c2 <= c1;r3 <= r2; c3 <= c2;r4 <= r3; c4 <= c3;TableEntry p <- ram.resp(); r5 <= nextReq(p, r4); c5 <= c4;if (c5 == 3) out.enq(r5); endruleRAMreqIP addrrespri, ciRAM latency=4March 6, 2006 L10-12http://csg.csail.mit.edu/6.375/Circular Pipeline Coderule enter (True);Token t <- cbuf.getToken();IP ip = in.first(); ram.req(ip[31:16]);active.enq(tuple2(ip[15:0], t)); in.deq();endrulerule done (True);TableEntry p <- ram.resp();match {.rip, .t} = active.first();if (isLeaf(p)) cbuf.done(t, p);else beginactive.enq(rip << 8, t);ram.req(p + signExtend(rip[15:7]));endactive.deq();endruleenter?enter?done?done?RAMcbufinactiveMarch 6, 2006 L10-13http://csg.csail.mit.edu/6.375/Completion bufferinterface CBuffer#(type any_T);method ActionValue#(Token) getToken(); method Action done(Token t, any_T d);method ActionValue#(any_T) getResult();endinterfacemodule mkCBuffer (CBuffer#(any_T)) provisos (Bits#(any_T,sz));RegFile#(Token, Maybe#(any_T)) buf <- mkRegFileFull();Reg#(Token) i <- mkReg(0); //input indexReg#(Token) o <- mkReg(0); //output indexReg#(Token) cnt <- mkReg(0); //number of filled slots…IIVIVIniobufMarch 6, 2006 L10-14http://csg.csail.mit.edu/6.375/Completion buffer... // state elements buf, i, o, n ...method ActionValue#(any_T) getToken() if (cnt <= maxToken);cnt <= cnt + 1; i <= i + 1;buf.upd(i, Invalid);return i;endmethodmethod Action done(Token t, any_T data);return buf.upd(t, Valid data);endmethodmethod ActionValue#(any_T) get() if (cnt > 0) &&& (buf.sub(o) matches tagged (Valid .x));o <= o + 1;cnt <= cnt - 1;return x;endmethodIIVIVIcntiobufMarch 6, 2006 L10-15http://csg.csail.mit.edu/6.375/Synthesis from rules ...we will revisit IP LPM block synthesis results after a better understanding of the synthesis procedure March 6, 2006 L10-16http://csg.csail.mit.edu/6.375/Synthesis: From State & Rules into Synchronous FSMsinterfacemoduleTransitionLogicIOS“Next”SCollectionofStateElementsMarch 6, 2006 L10-17http://csg.csail.mit.edu/6.375/Hardware ElementsCombinational circuits Mux, Demux, ALU, ...Synchronous state elements Flipflop, Register, Register file, SRAM, DRAMSelOI0I1InMux...SelIDe-Mux...O0O1OnOpSelect- Add, Sub, AddU, ...- And, Or, Not, ...- GT, LT, EQ, ...- SL, SR, SRA, ...ResultNCVZABALUffDffDffDffDffDffDffDffQQQQQQQQDClkEnregisterMarch 6, 2006 L10-18http://csg.csail.mit.edu/6.375/Flip-flops with Write EnablesffQDCENCDQENffQDCEN01ffQDCENdangerous!Edge-triggered: Data is sampled at the rising edgeMarch 6, 2006 L10-19http://csg.csail.mit.edu/6.375/Semantics and synthesisRulesSemantics: “Untimed” (one rule at a time)Verilog RTLSemantics: clocked synchronous HW(multiple rules per clock)SchedulingandSynthesisby the BSV compilerUsing Rule Semantics,establish functionalcorrectnessUsing Schedules,establish performancecorrectnessVerification


View Full Document

MIT 6 375 - Bluespec-4: Rule Scheduling and Synthesis

Documents in this Course
IP Lookup

IP Lookup

15 pages

Verilog 1

Verilog 1

19 pages

Verilog 2

Verilog 2

23 pages

Encoding

Encoding

21 pages

Quiz

Quiz

10 pages

IP Lookup

IP Lookup

30 pages

Load more
Download Bluespec-4: Rule Scheduling and Synthesis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bluespec-4: Rule Scheduling and Synthesis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bluespec-4: Rule Scheduling and Synthesis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?