MIT 6 375 - Bluespec-4: Rule Scheduling and Synthesis - D1633466

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 375> Bluespec-4: Rule Scheduling and Synthesis

DOC PREVIEW

MIT 6 375 - Bluespec-4: Rule Scheduling and Synthesis

School name Massachusetts Institute of Technology

Course 6 375- Complex Digital Systems

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

March 6, 2006 http://csg.csail.mit.edu/6.375/ L10-1Bluespec-4: Rule Scheduling and SynthesisArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyMarch 6, 2006 http://csg.csail.mit.edu/6.375/ L10-2Exploring microarchitecturesIP Lookup ModuleMarch 6, 2006 L10-3http://csg.csail.mit.edu/6.375/IP Lookup block in a routerQueueManagerPacket ProcessorExit functionsControlProcessorLine Card (LC)IP LookupSRAM(lookup table)ArbitrationSwitchLCLCLCA packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing tableLine rate and the order of arrival must be maintainedline rate ⇒15Mpps for 10GEMarch 6, 2006 L10-4http://csg.csail.mit.edu/6.375/1823M RefResultIP addressE5.13.7.2C10.18.200.77.14.7.2F10.18.201.5F7.13.7.3Sparse tree representation3A…A…BC…C…5DF…F…14A…A…7F…F…200F…F…F*E5.*.*.*D10.18.200.5C10.18.200.*B7.14.7.3A7.14.*.*F…F…FF…E57102550144AReal-world lookup algorithms are more complex but all make a sequence of dependent memory references.March 6, 2006 L10-5http://csg.csail.mit.edu/6.375/Table representation issuesTable size Depends on the number of entries: 10K to 100K Too big to fit on chip memory Î SRAM Î DRAM Îlatency, cost, power issuesNumber of memory accesses for an LPM? Too many Î difficult to do table lookup at line rate (say at 10Gbps)Control-plane issues: incremental table update size, speed of table maintenance softwareIn this lecture (to fit the code on slides!): Level 1: 16 bits, Level 2: 8 bits, Level 3: 8 bits ⇒ from 1 to 3 memory accesses for an LPMMarch 6, 2006 L10-6http://csg.csail.mit.edu/6.375/“C” version of LPMintlpm (IPA ipa) /* 3 memory lookups */{ int p;/* Level 1: 16 bits */p = RAM [ipa[31:16]]; if (isLeaf(p)) return p;/* Level 2: 8 bits */p = RAM [p + ipa [15:8]]; if (isLeaf(p)) return p;/* Level 3: 8 bits */p = RAM [p + ipa [7:0]]; return p; /* must be a leaf */}How to implement LPM in HW?Not obvious from the C code!…216 -10……28 -10…28 -10Must process a packet every 1/15 μs or 67 nsMust sustain 3 memory dependent lookups in 67 nsMarch 6, 2006 L10-7http://csg.csail.mit.edu/6.375/Static PipelineAssume the memory has a latency of n cycles and can accept a request every cycleInefficient memory usage – unused memory slots represent wasted bandwidth.Difficult to schedule table updatesRAMreqIP addrrespRAM latency=4March 6, 2006 L10-8http://csg.csail.mit.edu/6.375/Circular pipelineluReqluRespCompletion buffer- gives out tokens to control the entry into thecircular pipeline- ensures that departures take place in order even if lookups complete out-of-orderenter?enter?done?done?RAMcbufyesgetTokeninactivenoMarch 6, 2006 L10-9http://csg.csail.mit.edu/6.375/RAMs: Synchronous vsAsynchronous viewBasic memory components are "synchronous": Present a read-address AJon clock J Data DJarrives on clock J+N If you don't "catch" DJon clock J+N, it may be lost, i.e., data DJ+1may arrive on clock J+1+NThis kind of synchronicity can pervade the design and cause complicationsSynch Mem,Latency NAddr DataClockMarch 6, 2006 L10-10http://csg.csail.mit.edu/6.375/Asynchronous RAMsIt's easier to work with an "asynchronous" blockSynch MemLatency NAddrReadyctr(ctr > 0)ctr++ctr--deqEnableenqinterface AsyncRAM#(type addr_T, type data_T);w method Action req(addr_T a);w method ActionValue#(data_T) resp();endinterfaceDataAckDataReadyreqrespMarch 6, 2006 L10-11http://csg.csail.mit.edu/6.375/Static coderule static (True);if (c5 == 3) beginIP ip = in.first(); ram.req(ip[31:16]); r1 <= ip[15:0];in.deq(); c1 <= 1; endelse beginr1 <= r5; c1 <= c5+1;ram.req(r5);endr2 <= r1; c2 <= c1;r3 <= r2; c3 <= c2;r4 <= r3; c4 <= c3;TableEntry p <- ram.resp(); r5 <= nextReq(p, r4); c5 <= c4;if (c5 == 3) out.enq(r5); endruleRAMreqIP addrrespri, ciRAM latency=4March 6, 2006 L10-12http://csg.csail.mit.edu/6.375/Circular Pipeline Coderule enter (True);Token t <- cbuf.getToken();IP ip = in.first(); ram.req(ip[31:16]);active.enq(tuple2(ip[15:0], t)); in.deq();endrulerule done (True);TableEntry p <- ram.resp();match {.rip, .t} = active.first();if (isLeaf(p)) cbuf.done(t, p);else beginactive.enq(rip << 8, t);ram.req(p + signExtend(rip[15:7]));endactive.deq();endruleenter?enter?done?done?RAMcbufinactiveMarch 6, 2006 L10-13http://csg.csail.mit.edu/6.375/Completion bufferinterface CBuffer#(type any_T);method ActionValue#(Token) getToken(); method Action done(Token t, any_T d);method ActionValue#(any_T) getResult();endinterfacemodule mkCBuffer (CBuffer#(any_T)) provisos (Bits#(any_T,sz));RegFile#(Token, Maybe#(any_T)) buf <- mkRegFileFull();Reg#(Token) i <- mkReg(0); //input indexReg#(Token) o <- mkReg(0); //output indexReg#(Token) cnt <- mkReg(0); //number of filled slots…IIVIVIniobufMarch 6, 2006 L10-14http://csg.csail.mit.edu/6.375/Completion buffer... // state elements buf, i, o, n ...method ActionValue#(any_T) getToken() if (cnt <= maxToken);cnt <= cnt + 1; i <= i + 1;buf.upd(i, Invalid);return i;endmethodmethod Action done(Token t, any_T data);return buf.upd(t, Valid data);endmethodmethod ActionValue#(any_T) get() if (cnt > 0) &&& (buf.sub(o) matches tagged (Valid .x));o <= o + 1;cnt <= cnt - 1;return x;endmethodIIVIVIcntiobufMarch 6, 2006 L10-15http://csg.csail.mit.edu/6.375/Synthesis from rules ...we will revisit IP LPM block synthesis results after a better understanding of the synthesis procedure March 6, 2006 L10-16http://csg.csail.mit.edu/6.375/Synthesis: From State & Rules into Synchronous FSMsinterfacemoduleTransitionLogicIOS“Next”SCollectionofStateElementsMarch 6, 2006 L10-17http://csg.csail.mit.edu/6.375/Hardware ElementsCombinational circuits Mux, Demux, ALU, ...Synchronous state elements Flipflop, Register, Register file, SRAM, DRAMSelOI0I1InMux...SelIDe-Mux...O0O1OnOpSelect- Add, Sub, AddU, ...- And, Or, Not, ...- GT, LT, EQ, ...- SL, SR, SRA, ...ResultNCVZABALUffDffDffDffDffDffDffDffQQQQQQQQDClkEnregisterMarch 6, 2006 L10-18http://csg.csail.mit.edu/6.375/Flip-flops with Write EnablesffQDCENCDQENffQDCEN01ffQDCENdangerous!Edge-triggered: Data is sampled at the rising edgeMarch 6, 2006 L10-19http://csg.csail.mit.edu/6.375/Semantics and synthesisRulesSemantics: “Untimed” (one rule at a time)Verilog RTLSemantics: clocked synchronous HW(multiple rules per clock)SchedulingandSynthesisby the BSV compilerUsing Rule Semantics,establish functionalcorrectnessUsing Schedules,establish performancecorrectnessVerification

View Full Document