CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II)OutlineWhat are Network ProcessorsWhy Network ProcessorsSlide 5Organizing Processor ResourcesArchitectural ComparisonsArchitectural Comparisons (cont.)Tasks and ServicesPerformance EvaluationExample Toaster System: Cisco 10000IBM PowerNPC-Port C-5 Chip ArchitectureSome ChallengesSlide 15NP SoftwareBenchmarks for Network ProcessorsSlide 18IXP1200 Block DiagramIXP1200 MicroengineIXP 2400 Block DiagramDifferent Types of MemoryIXA Software FrameworkReferences12003 ©UCRCS 162 Computer Architecture Lecture 8: Introduction toNetwork Processors (II)Instructor: L.N. Bhuyanwww.cs.ucr.edu/~bhuyan/cs16222003 ©UCROutline°Introduction to NP Systems°Relevant Applications°Design Issues and Challenges°Relevant Software and Benchmarks°A case study: Intel IXP network processors32003 ©UCRWhat are Network Processors°Any device that executes programs to handle packets in a data network°Examples•Processors on router line cards•Processors in network access equipment42003 ©UCRWhy Network Processors°Current Situation•Data rates are increasing•Protocols are becoming more dynamic and sophisticated•Protocols are being introduced more rapidly °Processing Elements•GP(General-purpose Processor)-Programmable, Not optimized for networking applications•ASIC(Application Specific Integrated Circuit)-high processing capacity, long time to develop, Lack the flexibility•NP(Network Processor)-achieve high processing performance -programming flexibility-Cheaper than GP52003 ©UCROutline°Introduction to NP Systems°Relevant Applications°Design Issues and Challenges°Relevant Software and Benchmarks°A case study: Intel IXP network processors62003 ©UCROrganizing Processor Resources°Design decisions:•High-level organization•ISA and micro architecture•Memory and I/O integration°Today’s commercial NPs:•Chip multiprocessors•Most are multithreaded•Exploit little ILP (Cisco does)•No cache•Micro-programmed72003 ©UCRArchitectural Comparisons°High-level organizations•Aggressive superscalar (SS)•Fine-grained multithreaded (FGMT)•Chip multiprocessor (CMP)•Simultaneous multithreaded (SMT)Ref: [NPRD]82003 ©UCRArchitectural Comparisons (cont.)Time (processor cycle)Superscalar Fine-Grained Coarse-GrainedMultiprocessingThread 1Thread 2Thread 3Thread 4Thread 5Idle slotSimultaneousMultithreading92003 ©UCRTasks and ServicesThree Benchmarks used in the experiment Ref: [NPRD]102003 ©UCR°Systems must support some form of concurrent packet-level parallelism°SMT and CMP are nearly equivalent, with SMT always coming out aheadForwarding: IP ForwardAuthentication: MD5Encryption: 3DESSSFGMTCMPSMT•Workloads have little ILP•Need to exploit packet-level parallelism•CMP and SMT do just thatPerformance EvaluationRef: [NPRD]112003 ©UCRExample Toaster System: Cisco 10000°Almost all data plane operations execute on the programmable XMC°Pipeline stages are assigned tasks – e.g. classification, routing, firewall, MPLS•Classic SW load balancing problem°External SDRAM shared by common pipe stagesRef: [NPT]122003 ©UCRIBM PowerNP°16 pico-procesors and 1 powerPC°Each pico-processor•Support 2 hardware threads•3 stage pipeline : fetch/decode/execute°Dyadic Processing Unit•Two pico-processors•2KB Shared memory•Tree search engine°Focus is layers 2-4°PowerPC 405 for control plane operations•16K I and D caches°Target is OC-48Ref: [NPT]132003 ©UCRC-Port C-5 Chip Architecturet e x tt e x tQ u e u eM n g t U n i tF a b r i cP r o c e s s o rT a b l eL o o k u pU n i tB u f f e r M n g tU n i tE x e c u t i v eP r o c e s s o rC P - 0P H YC P - 1P H YC P - 2P H YC P - 3P H YC l u s t e rt e x tC P -1 2P H YC P -1 3P H YC P -1 4P H YC P -1 5P H YC l u s t e r6 0 G b p s B u s s e sS R A MS R A MS R A MS w i t c hF a b r i cPROMPCICONTROLRef: [NPT]142003 ©UCRSome Challenges°Intelligent Design•Given a selection of programs, a target network link speed, the ‘best’ design for the processor-Least area-Least power-Most performance°Write efficient multithreaded programs•NPs have-Heterogeneous computer resources-Non-uniform memory-Multiple interacting threads of execution-Real-time constraints•Make use of resources-How to use special instructions and hardware assists–Compilers–Hand-coded•Multithreaded programs-Manage access to shared state-Synchronization between threadsRef: [NPRD]152003 ©UCROutline°Introduction to NP Systems°Relevant Applications°Design Issues and Challenges°Relevant Software and Benchmarks°A case study: Intel IXP network processors162003 ©UCRNP Software°Teja•NPU vendor-neutral software tools•Key is a GUI-based state-machine tool°CLICK router•From MIT, supports a specialized development model°Zebra•Open source routing environment•Supporting most of the key IP routing protocols in SW•IP Fusion Inc. is providing commercial support°LVL7•Closed source – i.e. traditional commercial – complete IP solutionsRef: [NPT]172003 ©UCRBenchmarks for Network Processors•NetBench-10 applications-http://cares.icsl.ucla.edu/NetBench•CommBench-8 networking and communications applications-http://ccrc.wustl.edu/~wolf/cb/•EEMBC-http://www.eembc.org/benchmark •MediaBench-Transcoders-Some communications applicationsRef: [NPT]182003 ©UCROutline°Introduction to NP Systems°Relevant Applications°Design Issues and Challenges°Relevant Software and Benchmarks°A case study: Intel IXP network processors192003 ©UCRIXP1200 Block Diagram°StrongARM processing core°Microengines introduce new ISA°I/O•PCI•SDRAM•SRAM•IX : PCI-like packet bus°On chip FIFOs•16 entry 64B eachRef: [NPT]202003 ©UCRIXP1200 Microengine°4 hardware contexts•Single issue processor•Explicit optional context switch on SRAM access°Registers•All are single ported•Separate GPR•256*6 = 1536 registers total°32-bit ALU•Can access GPR or XFER registers°Shared hash unit•1/2/3 values – 48b/64b•For IP routing hashing°Standard 5 stage pipeline°4KB SRAM instruction store – not a cache!°Barrel shifterRef: [NPT]212003 ©UCRIXP 2400 Block Diagram°XScale core replaces StrongARM°Microengines•Faster•More: 2 clusters of 4 microengines each°Local memory°Next neighbor routes added between microengines°Hardware to accelerate CRC operations and Random number generation°16 entry CAMME0 ME1ME2ME3ME4 ME5ME6ME7Scratch/Hash/CSRMSF UnitDDR DRAM
View Full Document