Slide 1BackgroundNetwork packet processing tasksLookup performance scalabilityIP lookup using a trieIP lookup using TCAMRecap: Why is TCAM inefficient?CA-RAM–a hybrid approachCA-RAM–Content Addressable RAMVery simple, yet efficientPipelined CA-RAM operationDealing w/ bucket overflowsCA-RAM reconfig. opportunitiesAdapting key sizeCA-RAM reconfig. opportunitiesSupporting binary/ternary matchingCA-RAM reconfig. opportunitiesSimult. key matching & data accessCA-RAM reconfig. opportunitiesSupporting range checkingEvaluationMapping a large IP routing tableMapping a large IP routing tableComparing CA-RAM and TCAMConclusionsMapping a large IP routing tableCA-RAM componentsA Search Memory Substrate forHigh Throughput and Low PowerPacket ProcessingSangyeun Cho, Michel Hanna and Rami MelhemDept. of Computer ScienceUniversity of PittsburghBackgroundISPInternetSubnetSubnetISPISPend userend userrouterrouterend userend userend userend userNetwork packet processing tasksPacket forwarding•Given an IP address•Look up in a table (IP table) a matching prefix•Make sure the chosen prefix is longest LPM (Longest Prefix Matching) requirementRule-based packet filtering•Given a set of packet fields (src/dst IP, src/dst port, protocol, …)•Look up in a rule database matching entriesDeep packet inspection•Given a string in packet payload•Look up in a signature database matching entriesLookup performance scalabilityLookup performance must match increasing line speeds•For OC-768, up to 104M packets must be processed per second•Network traffic has doubled every year [McKeown03]•Router capacity doubles every 18 monthsCapacity pressure•Routing tables (~200K prefixes in a core router) are growing [RIS]•# of firewall rules increases; 100K rules are practical [Baboescu04]•IPv6Power and thermal issues already a critical limiting factor in network processing device design [McKeown03]Two conventional lookup solutions•Software methods (tries, hash table, …)•Hardware methods (TCAM, Bloom filter, …)IP lookup using a trie Consider an IP address: 0 1 0 0 0 1 1 0 “flexibility” high memory capacity requirement low memory bandwidth utilization not SCALABLEIP lookup using TCAM Consider an IP address: 0 1 0 0 0 1 1 0110100*110101*110111*01000*01100*01101*11011*0100*0110*1101*10*0*sort beforestoringchoose the firstamong the matched high bandwidth, constant time lookup TCAMs are relatively small, expen-sive power consumption very high not SCALABLERecap: Why is TCAM inefficient? all bits are involved in matching large embedded match logic “large” means more work in this caseCA-RAM–a hybrid approachCan we do better than the existing schemes?•Flexibility and search performance•Exploit optimized RAM designs•Hardware approach (software too slow)CA-RAM combines hashing w/ hardware parallel matchingCA-RAM design goals•High lookup performance•Low power consumption•Smaller chip area per stored datum•Straightforward system-level integrationCA-RAM–Content Addressable RAMSeparate match logic and memoryÞMatch logic for a single row, not every rowÞAllows the use of dense RAM technologyÞEnables highly reconfigurable match logicÞ(Keep keys sorted in each row, not in entire array)Match logicMemory cellsConventional CAM/TCAM CA-RAMVery simple, yet efficientUse hashing to store keys in a particular rowTo look up, hash the key and retrieve one rowPerform matching on entire row in parallelÞAchieve (full) content addressability w/o paying overhead!Index generatorKeyi1Match processor1……Keyi2Keyj2Keyj1Match processor2…keyPipelined CA-RAM operationIndex generatorSearch keyKeyi1Match processor1Keyi2Keyj2Keyj1Match processor2ResultMatch processor3Keyi3Keyj3Step 1 Step 2 Step 3 Step 4IndexKeyj2Keyj1Keyj3Search key Match processor2Index generationMemory accessKey matchingResult forwardingDealing w/ bucket overflowsCareful design of hash functionIncrease bucket size•Reduce load factor (); = # of occupied entries / # of total entries•Trade-off space for performanceUse “chaining”; store overflows in subsequent rows•Multiple accesses per lookupUse a small overflow CAM, accessed in parallel•Similar to popular “victim caching” in computer architectureUse two-level hashing and employ multiple CA-RAM banks……CA-RAM reconfig. opportunitiesReconfigurable match logic allows:Adapting key size to apps•Same hardware to support multiple apps or standards……Adapting key sizeKeyi1Reconfigurable match logicKeyi2Keyj2Keyj1Keyi3Keyj3Match informationKeyi1Keyi2Keyj2Keyj1 Adapting key size is straightforward Will benefit supporting multiple apps/ standardsSelect key bitsfor matchingCA-RAM reconfig. opportunitiesReconfigurable match logic allows:Adapting key size to apps•Same hardware to support multiple apps or standardsBinary and ternary matching•Some apps require ternary matching, some don’t……Supporting binary/ternary matchingReconfigurable match logicMatch informationKeyi1Keyi2Keyj2Keyj1Search keyMaskj1Maski1 Developed configurable compara-tor T-matching requires 2 bits / 1 symbol Supporting different types of matching in different bit positions feasibleConsider maskbits or notCA-RAM reconfig. opportunitiesReconfigurable match logic allows:Adapting key size to apps•Same hardware to support multiple apps or standardsBinary and ternary matching•Some apps require ternary matching, some don’tStoring data and keys in a CA-RAM module•Cuts # of memory accesses for IP lookup by half……Simult. key matching & data accessReconfigurable match logicMatch informationKeyi1Keyi2Keyj2Keyj1Search keyDataj1Datai1 Data access follows TCAM lookup CA-RAM supports data embedding Cuts memory traffic & latency by halfMatch information & DataMatch key &bypass dataCA-RAM reconfig. opportunitiesReconfigurable match logic allows:Adapting key size to apps•Same hardware to support multiple apps or standardsBinary and ternary matching•Some apps require ternary matching, some don’tStoring data and keys in a CA-RAM module•Cuts # of memory accesses for IP lookup by halfProviding range checking capabilities•Beneficial for rule-based packet filtering……Supporting range checkingReconfigurable match logicMatch informationKeyi1Rangei1Rangej1Keyj1Search key (Range checking causes troubles) (Entries must be
View Full Document