11/16/10 1 GMU SHA Core Interface & Hash Function Performance Metrics Interface11/16/10 2 3 Why Interface Matters? • Pin limit Total number of i/o ports ≤ Total number of an FPGA i/o pins • Support for the maximum throughput Time to load the next message block ≤ Time to process current block 4 Interface: Two possible solutions Length of the message communicated at the beginning + easy to implement passive source circuit − area overhead for the counter of message bits Dedicated end-of-message port − more intelligent source circuit required + no need for internal message bit counter msg_bitlen zero_word message end_of_msg SHA core11/16/10 3 5 SHA Core: Interface & Typical Configuration • SHA core is an active component; surrounding FIFOs are passive and widely available • Input interface is separate from an output interface • Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel fifoin_empty,fifoin_read,idata,w,w,odata,fifoout_full,fifoout_write,fifoin_full,fifoin_write,fifoout_empty,fifoout_read,Input&FIFO&SHA&core&clk,rst,ext_idata,w,ext_odata,din, dout,src_ready,src_read,dst_ready,dst_write,din,dout,full,empty,write, read,Output&FIFO&din,dout,full,empty,write, read,w,clk,rst,clk, rst, clk,rst,clk,rst,clk, rst,6 SHA Core Interface w,SHA&core&din, dout,src_ready,src_read,dst_ready,dst_write,clk,rst,clk,rst,w,11/16/10 4 7 SHA Core Interface + Surrounding FIFOs fifoin_empty,fifoin_read,idata,w,w,odata,fifoout_full,fifoout_write,fifoin_full,fifoin_write,fifoout_empty,fifoout_read,Input&FIFO&SHA&core&clk,rst,ext_idata,w,ext_odata,din, dout,src_ready,src_read,dst_ready,dst_write,din,dout,full,empty,write, read,Output&FIFO&din,dout,full,empty,write, read,w,clk,rst,clk,rst,clk,rst,clk,rst,clk,rst,8 Operation of FIFO11/16/10 5 9 Communication Protocol for Unpadded Messages msg_bitlen zero_word −−−−− message w bits . . . seg_0_bitlen zero_word seg_0 w bits seg_1_bitlen seg_1 seg_n-1_bitlen seg_n-1 a) b) −−−−− 10 SHA Core Interface with Additional Faster I/O Clock w,SHA&core&din, dout,src_ready,src_read,dst_ready,dst_write,clk,rst,clk,rst,w,io_clk,io_clk,11/16/10 6 11 SHA Core Interface with Two Clocks + Surrounding FIFOs fifoin_empty,fifoin_read,idata,w,w,odata,fifoout_full,fifoout_write,fifoin_full,fifoin_write,fifoout_empty,fifoout_read,Input&FIFO&SHA&core&clk,rst,ext_idata,w,ext_odata,din, dout,src_ready,src_read,dst_ready,dst_write,din,dout,full,empty,write, read,Output&FIFO&din,dout,full,empty,write, read,w,clk,rst,io_clk,rst,io_clk,rst,clk,rst,clk,rst,io_clk,io_clk,12 Communication Protocol for Padded Messages Without Message Splitting msg_len_ap | last = 1 message msg_len_bp msg_len_ap – message length after padding [bits] msg_len_bp – message length before padding [bits] w bits11/16/10 7 13 Communication Protocol for Padded Messages With Message Splitting . . . seg_0_len_ap | last=0 seg_0 w bits seg_1_len_ap | last=0 seg_1 seg_n-1_len_ap | last=1 seg_n-1 seg_n-1_len_bp seg_i_len_ap – segment i length after padding* [bits] seg_i_len_bp – segment i length before padding [bits] * For all i < n-1 segment i length after padding is assumed to be a multiple of the message block size, b [characteristic to each function], and thus also the word size, w. The last segment cannot consist of only padding bits. It must include at least one message bit. Performance Metrics11/16/10 8 15 Performance Metrics - Speed Throughput for Long Messages [Mbit/s] Throughput for Short Messages [Mbit/s] Execution Time for Short Messages [ns] Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries) 16 Performance Metrics - Speed Time to hash N blocks of message [cycles] = Htime(N) The exact formula from analysis of a block diagram, confirmed by functional simulation. Minimum Clock Period [ns] = T From a place & route and/or static timing analysis report file.11/16/10 9 17 Time to Hash N Blocks of the Message [clock cycles] 18 Performance Metrics - Speed Minimum time to hash N blocks of message [ns] = Htime(N)⋅T Maximum Throughput (for long messages) T * (Htime(N+1) - Htime(N)) block_size block_size =T * block_processing_time =Effective maximum throughput for short messages:11/16/10 10 19 Performance Metrics - Speed Maximum Throughput (for long messages) =block_size T * block_processing_time from specification from place & route report and/or static timing analysis report from analysis of block diagram and/or functional simulation 20 Performance Metrics - Area For the basic, folded, and unrolled architectures, we force these vectors to look as follows through the synthesis and implementation options: 0 0 0 0 Areaa11/16/10 11 21 Primary Optimization Target: Throughput to Area Ratio Features: • practical: good balance between speed and cost • very reliable guide through the entire design process, facilitating the choice of high-level architecture implementation of basic components choice of tool options • leads to high-speed, close-to-maximum-throughput designs Choice of Optimization Target 22 Our Design Flow Specification Interface Datapath Block diagram Controller ASM Chart VHDL Code Formulas for Throughput & Hash time Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages Controller Template Library of Basic Components11/16/10 12 23 How to compare hardware speed vs. software speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) – Time(2048) 2048 Performance for long message = 23 24 How to compare hardware speed vs. software speed? Throughput [Gbit/s] = Performance for long message [cycles/byte] 8 bits/byte ⋅ clock frequency [GHz]
View Full Document