Unformatted text preview:

EE 392 I Field Programmable Technology for Mainstream Processing Gordon Brebner Xilinx Research Labs 27 April 2010 What this talk is about Overview of the Field Programmable Gate Array FPGA Processing opportunities for FPGAs The relationship between FPGAs and CPUs Programming models for FPGAs 2 Copyright 2009 Xilinx Field programmable technology Field programmable means programmable in the field In other words it is a soft hardware technology Lingering historical stereotype The Programmable Logic Array PLA More advanced stereotype the sea of programmable logic gates Today s focus the Field Programmable Gate Array FPGA 3 Copyright 2009 Xilinx A contemporary FPGA the Xilinx Virtex 6 Example XC6VHX565T 566 784 programmable logic cells 32 832 Mb block memory 864 DSP blocks 720 parallel input outputs 24 GTH transceivers to 11 2G 48 GTX transceivers to 6 6G Programmable logic 4 PCI Express blocks 4 Tri mode MAC blocks Block memory DSP block High performance clocking Parallel input output High speed transceiver PCI Express interface Tri mode Ethernet MAC Importantly the components are interconnected by a very large amount of programmable wiring System Monitor 4 Copyright 2009 Xilinx Original FPGA use situations Prototyping ASICs Classic style of chip design flow Generate FPGA programming information instead of mask data Providing glue logic for systems Implement random logic to connect together other chips Typically small scale functions Implementing input output interfaces Do heavy lifting for getting bits in and out quickly Exploit the support for numerous physical i o standards 5 Copyright 2009 Xilinx Positioning FPGAs for mainstream processing Moving the FPGA to the heart of the system As a first order processing component Background FPGAs have the raw processing capacity Approaching millions of programmable logic cells Other hardened processing components Embedded memory blocks Question how do you organize this low level capacity What soft processing architectures might be configured How can a high level programming experience be provided 6 Copyright 2009 Xilinx FPGAs for Digital Signal Processing DSP Best known FPGA processing domain streaming data flow Low level filters DFT FFT coders decoders etc High level wireless e g 802 16 LTE video e g H 264 MPEG Example of benefit 7 Copyright 2009 Xilinx FPGAs for High Performance Computing HPC Increasing area of application Enabled by the increasing capabilities of FPGA Scale down supercomputing techniques to FPGA style parallelism Example areas Finance automated stock trading etc SAN JOSE Calif April 6 2010 Bioinformatics gene sequencing etc Xilinx Helps University of Regensburg Launch the World s Most Power Efficient Supercomputer Cryptography QPACE a bespoke supercomputer developed to unlock the mysteries of Quantum Chromodynamics Meteorology Lack of native support for floating point arithmetic has retarded adoption in certain areas 8 Copyright 2009 Xilinx Network processing contexts NIC card Standalone switch router Telecom line card 9 Copyright 2009 Xilinx The NetFPGA platform netfpga org 10 Stanford Xilinx collaboration Worldwide teaching and research ecosystem Four gigabit Ethernet ports PCI interface to host PC Four x 10G version imminent Copyright 2009 Xilinx NetFPGA projects 11 Copyright 2009 Xilinx Processing on a telecom line card Frame data in physical blocks Compute error check codes Optics PHY Framer MAC Mapper Packet Processor Traffic Manager Classify packets into flows Parse and edit packets 12 Copyright 2009 Xilinx Bridging Backplane Interface Backplane Police and shape flow rates Queue and schedule packets FPGAs for Network Processing Physical line interface and backplane interface Standard input output interfacing use case for FPGAs Framing and error checking Moving bits to the right places Computing standard coding functions Packet processing Classification computing hashing functions lookup in binary ternary CAM Inspection and surgery programmable parsing and editing of packets Traffic management Moving packets to the right places and storing them Computing policing and scheduling functions Functions can be carried out at line rates of 100 Gb s and above 13 Copyright 2009 Xilinx Example of FPGA in virtual networking node High speed e science network High reliability sensor network Highly secure network Virtual node 1 Virtual node 2 Virtual node 3 High throughput data plane Triply redundant datapath Security algorithm acceleration Dynamically managed physical Hardware 14 Copyright 2009 Xilinx Relationship between FPGAs and CPUs FPGA processing and CPU processing are complementary FPGA better for bit level high speed streaming regular settings CPU better for word level random access irregular settings Processors can be embedded into FPGAs Hardened in the silicon Example PowerPC included in several Xilinx Virtex generations Example ARM just announced by Xilinx today Soft Built out of programmable logic Can now fit in multi core architectures with 100s of processors In principle programmable logic can be embedded into CPUs 15 Copyright 2009 Xilinx Cohabitation model 1 Instruction acceleration CPU instruction set augmented by custom instructions When a custom instruction is executed 1 Instruction operands are presented over a CPU data bus 2 Become inputs to a processing block on the FPGA 3 Processing operation is carried out 4 Outputs directed from processing block to data bus 5 Instruction result s are obtained from CPU data bus Model applicable whether CPU is internal or external to the FPGA Latency is higher in the latter case of course 16 Copyright 2009 Xilinx Cohabitation model 2 Function acceleration More loosely coupled hardware acceleration CPU requests function computation by FPGA Library function call or Operating system function or Input output operation FPGA receives function request with function arguments FPGA returns function result s Latency may be hidden by multi threading or multi tasking 17 Copyright 2009 Xilinx Cohabitation model 3 Peer processing Example Xilinx Intel Use Intel Front Side Bus FSB Cache to cache data transfer Bypass slow system memory Heterogeneous Multi core x86 cores FPGA processing elements Performance 8 5 GB s bandwidth 105ns latency for 64 bytes Programming Model Global shared memory space Always consistent HW coherency 18 Copyright 2009 Xilinx Cohabitation model 4 Software deceleration Inverse of conventional hardware acceleration model Mainstream high speed processing functions


View Full Document

Stanford EE 392 - Field programmable technology

Loading Unlocking...
Login

Join to view Field programmable technology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Field programmable technology and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?