New version page

DDECS2002_final

This preview shows page 1-2-3-4 out of 11 pages.

View Full Document
View Full Document

End of preview. Want to read all 11 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

FAST IP ADDRESS LOOKUP ENGINE FOR SOC INTEGRATIONAbstract. IP packet forwarding is a key operation in internet routers and thelimiting factor is the IP address lookup. Since almost a whole router is integra-ted on a chip it is important to design the functional blocks with SoC integrationin mind. Here a hardware architecture for IP address lookup based on the k-multibit trie search algorithm is presented. The designed and simulated architec-ture is based on one SRAM access in series with two multiplexers and is claimedto be optimally fast. Based on the performance of embedded SRAMs an approxi-mate throughput of 500 MPackets/s is achieved when implemented in 0.18 µmtechnology. Memory requirements, multiplexing, scaling, and updating issues ofthe architecture as well as 2-dimensional search are described.1 IntroductionTo make the continued growth of internet applications possible the routers must keep impro-ving in performance much faster than Moore’s law predicts. Due to space, power consump-tion and performance requirements it is desired to integrate as much as possible of a routeron one single chip. One way of doing that is to design the router as several heterogeneosdomain specific processors. An example of such a processor can be seen in [1].The most critical component of today’s internet routers is the IP address lookup, which isneeded for forwarding packets to the right output port of the router.During the last few years some algorithms have been presented for IP address lookup. Agood survey is provided in [2]. However, most algorithms are for software implementationsand therefore inherently not optimal from a performance point of view and not suitable whena domain specific processor can be used. New software algorithms, that make use of theaccess pattern to improve the average performance have been suggested more recently [3]and [4], but they suffer from similar limitations. Another approach is to use hash functions[5], but the memory requirement seems to be too high to allow an efficient SoC hardwareimplementation.Some approaches have been introduced for hardware schemes [8], [9], [10], and [11].These are more interesting, when looking for a fast solution and we discuss them more tho-roughly in section 7.In this paper we present a novel, regular, and scalable hardware architecture for perfor-ming IP address lookup optimally fast. We start with IPv4 addresses, for which we providean extensive performance analysis of our architecture. Then, we enhance the architecturewith three multiplexing schemes, which add significant flexibility to the architecture andthus we achieve a domain specific processor for IP address lookup, which is suitable for SoCrouter integration. We also extend the analysis to 128 bit long IPv6 addresses and forwardingbased both on destination and source addresses.Tomas HenrikssonDepartment of Electrical EngineeringLinköpings universitetSE-581 83 Linkö[email protected] VerbauwhedeUCLA EE Dept7440B Boelter HallLos Angeles CA [email protected] ability to update a forwarding table is essential to an implementation and we des-cribe two different approaches, which both support instantaneous updates for our architec-ture.In section 2 the general problems with longest prefix matching are briefly explained, thena discussion on embedded memories, which are essential and performance limiting compo-nents for IP address lookup, is provided in section 3. Thereafter, in section 4, our novelapproach is outlined and in section 5 our simulation results are presented. Updating issuesand further extensions are discussed in section 6. Our results are compared to related work insection 7 and finally conclusions are drawn and future work is outlined in section 8.2 IP Address Lookup with Longest Prefix MatchThe IP address lookup with longest prefix match problem has been described in many papers(e.g. [2]). The forwarding of packets is based on matching the destination address of thepacket to prefixes in a routing table. Each prefix consist of a tuple <routing prefix, prefixlength>. The number of valid bits in the prefix is determined by the prefix length. In theInternet of today the classless interdomain routing (CIDR) is used, which means routing pre-fixes of any length can be used. Still, however, many routing prefixes are of length 16 or 24,because of the original internet class-based routing scheme.The task is conceptually simple, the routing prefix with the longest length, that match thedestination address is the best match. To each prefix an action pointer is associated. Theaction pointer associated with the best match points to the action that should be taken for thepacket. For simplicity we can think of this action pointer as the output port identifier, butmore information is normally provided, such as next hop IP address.Although the task is simple to understand, it has proven hard to implement in a goodway, especially with ever growing routing tables [7]. There are examples of more than 100krules in some core routers. To avoid a complicated search algorithm, direct memory lookupcan be applied. One way is to let the destination address be the address to the memory andaccess the result directly. Already with IP addresses of 32 bits 4 billion entries would beneeded and for IPv6 addresses of 128 bits the approach is totally infeasible. Instead a combi-nation of search algorithm and memory lookup can be used.3 Memories for IP Address LookupIt can be assumed that the memory and the processing element will be implemented on thesame silicon die. Although it may be possible to use embedded DRAM with a somewhatmore complicated manufacturing process, it is not desirable since the speed-up techniquesused for cache-based memory hierarchies cannot be used in the IP address lookup algo-rithms due to the non sequential access scheme. Therefore embedded SRAM with true ran-dom access is better suited. On the other hand, the problem with SRAM is the limitedstorage capacity and bigger silicon die area requirement. As will be shown later, the perfor-mance of the lookup is highly dependent on the memory access time. Also it must be noti-ced, that the memory access time is dependent on the size of the SRAM, both on the numberof words and the word length [12].So it is crucial to keep the size of the memory as small as possible and definitely below thelimit for embedded SRAMs which is some hundred kilobytes. If the memory can be splitinto several, preferably equally


Loading Unlocking...
Login

Join to view DDECS2002_final and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DDECS2002_final and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?