Slide 1IP Lookup block in a routerSparse tree representation“C” version of LPMLongest Prefix Match for IP lookup: 3 possible implementation architecturesCircular pipelineFIFORequest-Response Interface for Synchronous MemoryCircular Pipeline CodeCircular Pipeline Code: discussionOne Element FIFODead cyclesThe Effect of Dead CyclesThe compiler issueScheduling conflicting rulesSo is there a dead cycle?Rule SplitingSpliting the recirculate ruleBack to the fifo problemRWire to rescueOne Element “Loopy” FIFOProblem solved!Packaging a module: Turning a rule into a methodCircular pipeline with Completion BufferCircular Pipeline Code with Completion BufferCompletion bufferSlide 27Slide 28Implementations of Static pipelines Two designers, two resultsSynthesis resultsFebruary 17, 2009 http://csg.csail.mit.edu/arvind L06-1IP LookupArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyFebruary 17, 2009L06-2http://csg.csail.mit.edu/arvindIP Lookup block in a routerQueueManagerPacket ProcessorExit functionsControlProcessorLine Card (LC)IP LookupSRAM(lookup table)ArbitrationSwitchLCLCLCA packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing tableLine rate and the order of arrival must be maintainedline rate 15Mpps for 10GEFebruary 17, 2009L06-3http://csg.csail.mit.edu/arvind1823IP address Result M Ref7.13.7.3 F10.18.201.5 F7.14.7.25.13.7.2 E10.18.200.7 CSparse tree representation3A…A…BC…C…5DF…F…14A…A…7F…F…200F…F…F*E5.*.*.*D10.18.200.5C10.18.200.*B7.14.7.3A7.14.*.*F…F…FF…E57102550144AIn this lecture:Level 1: 16 bits Level 2: 8 bits Level 3: 8 bits 1 to 3 memory accessesFebruary 17, 2009L06-4http://csg.csail.mit.edu/arvind“C” version of LPMintlpm (IPA ipa) /* 3 memory lookups */{ int p;/* Level 1: 16 bits */ p = RAM [ipa[31:16]]; if (isLeaf(p)) return value(p);/* Level 2: 8 bits */ p = RAM [ptr(p) + ipa [15:8]]; if (isLeaf(p)) return value(p);/* Level 3: 8 bits */ p = RAM [ptr(p) + ipa [7:0]]; return value(p); /* must be a leaf */}Not obvious from the C code how to deal with - memory latency - pipelining…216 -10……28 -10…28 -10Must process a packet every 1/15 s or 67 nsMust sustain 3 memory dependent lookups in 67 nsMemory latency ~30ns to 40nsFebruary 17, 2009L06-5http://csg.csail.mit.edu/arvindLongest Prefix Match for IP lookup:3 possible implementation architecturesRigid pipelineInefficient memory usage but simple designLinear pipelineEfficient memory usage through memory port replicatorCircular pipelineEfficient memory with most complex controlDesigner’s Ranking:1 2 3Which is “best”?Arvind, Nikhil, Rosenband & Dave ICCAD 2004February 17, 2009L06-6http://csg.csail.mit.edu/arvindCircular pipelineThe fifo holds the request while the memory access is in progressThe architecture has been simplified for the sake of the lecture. Otherwise, a “completion buffer” has to be added at the exit to make sure that packets leave in order.enter?enter?done?done?RAMyesinQfifonooutQFebruary 17, 2009L06-7http://csg.csail.mit.edu/arvindinterface FIFO#(type t); method Action enq(t x); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest itemendinterfaceFIFO n = # of bits needed to represent a value of type tnot fullnot emptynot emptyrdyenabnnrdyenabrdyenqdeqfirstFIFOmoduleFebruary 17, 2009L06-8http://csg.csail.mit.edu/arvindAddrReadyctr(ctr > 0)ctr++ctr--deqEnableenqRequest-Response Interface for Synchronous MemorySynch MemLatency Ninterface Mem#(type addrT, type dataT);method Action req(addrT x); method Action deq(); method dataT peek();endinterfaceDataAckDataReadyreqdeqpeekMaking a synchronous component latency- insensitiveFebruary 17, 2009L06-9http://csg.csail.mit.edu/arvindCircular Pipeline Code rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq();endruleenter?enter?done?done?RAMinQfiforule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq();endruleWhen can enter fire?inQ has an element and ram & fifo each has space done? Is the same as isLeafFebruary 17, 2009L06-10http://csg.csail.mit.edu/arvindCircular Pipeline Code: discussionrule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq();endruleenter?enter?done?done?RAMinQfiforule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq();endruleWhen can recirculate fire?ram & fifo each has an element and ram, fifo & outQ each has space Is this possible?February 17, 2009L06-11http://csg.csail.mit.edu/arvindmodule mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule One Element FIFOenq and deq cannot even be enabled together much less fire concurrently!nnot emptynot fullrdyenabrdyenabenqdeqFIFOmoduleThe functionality we want is as if deq “happens” before enq; if deq does not happen then enq behaves normallyWe can build such a FIFOmore on this laterFebruary 17, 2009L06-12http://csg.csail.mit.edu/arvindDead cyclesrule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq();endruleenter?enter?done?done?RAMinQfiforule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq();endruleCan a new request enter the system when an old one is leaving?Is this worth worrying about?assume simultaneous enq & deq is allowedFebruary 17, 2009L06-13http://csg.csail.mit.edu/arvindThe Effect of Dead Cyclesenterenterdone?done?RAMyesinfifonoWhat is the performance loss if “exit” and “enter” don’t ever happen in the same cycle?>33% slowdown! UnacceptableCircular PipelineRAM takes several cycles to respond to a request Each IP request generates 1-3 RAM requestsFIFO entries hold
View Full Document