MASON ECE 545 - GMU SHA Core Interface & Hash Function Performance Metrics - D941931

Home> Schools> George Mason University> (ECE) > ECE 545> GMU SHA Core Interface & Hash Function Performance Metrics

DOC PREVIEW

MASON ECE 545 - GMU SHA Core Interface & Hash Function Performance Metrics

School name George Mason University

Course Ece 545- Digital System Design with VHDL

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11/16/10 1 GMU SHA Core Interface & Hash Function Performance Metrics Interface11/16/10 2 3 Why Interface Matters? • Pin limit Total number of i/o ports ≤ Total number of an FPGA i/o pins • Support for the maximum throughput Time to load the next message block ≤ Time to process current block 4 Interface: Two possible solutions Length of the message communicated at the beginning + easy to implement passive source circuit − area overhead for the counter of message bits Dedicated end-of-message port − more intelligent source circuit required + no need for internal message bit counter msg_bitlen zero_word message end_of_msg SHA core11/16/10 3 5 SHA Core: Interface & Typical Configuration • SHA core is an active component; surrounding FIFOs are passive and widely available • Input interface is separate from an output interface • Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel ﬁfoin_empty,ﬁfoin_read,idata,w,w,odata,ﬁfoout_full,ﬁfoout_write,ﬁfoin_full,ﬁfoin_write,ﬁfoout_empty,ﬁfoout_read,Input&FIFO&SHA&core&clk,rst,ext_idata,w,ext_odata,din, dout,src_ready,src_read,dst_ready,dst_write,din,dout,full,empty,write, read,Output&FIFO&din,dout,full,empty,write, read,w,clk,rst,clk, rst, clk,rst,clk,rst,clk, rst,6 SHA Core Interface w,SHA&core&din, dout,src_ready,src_read,dst_ready,dst_write,clk,rst,clk,rst,w,11/16/10 4 7 SHA Core Interface + Surrounding FIFOs ﬁfoin_empty,ﬁfoin_read,idata,w,w,odata,ﬁfoout_full,ﬁfoout_write,ﬁfoin_full,ﬁfoin_write,ﬁfoout_empty,ﬁfoout_read,Input&FIFO&SHA&core&clk,rst,ext_idata,w,ext_odata,din, dout,src_ready,src_read,dst_ready,dst_write,din,dout,full,empty,write, read,Output&FIFO&din,dout,full,empty,write, read,w,clk,rst,clk,rst,clk,rst,clk,rst,clk,rst,8 Operation of FIFO11/16/10 5 9 Communication Protocol for Unpadded Messages msg_bitlen zero_word −−−−− message w bits . . . seg_0_bitlen zero_word seg_0 w bits seg_1_bitlen seg_1    seg_n-1_bitlen seg_n-1 a) b) −−−−− 10 SHA Core Interface with Additional Faster I/O Clock w,SHA&core&din, dout,src_ready,src_read,dst_ready,dst_write,clk,rst,clk,rst,w,io_clk,io_clk,11/16/10 6 11 SHA Core Interface with Two Clocks + Surrounding FIFOs ﬁfoin_empty,ﬁfoin_read,idata,w,w,odata,ﬁfoout_full,ﬁfoout_write,ﬁfoin_full,ﬁfoin_write,ﬁfoout_empty,ﬁfoout_read,Input&FIFO&SHA&core&clk,rst,ext_idata,w,ext_odata,din, dout,src_ready,src_read,dst_ready,dst_write,din,dout,full,empty,write, read,Output&FIFO&din,dout,full,empty,write, read,w,clk,rst,io_clk,rst,io_clk,rst,clk,rst,clk,rst,io_clk,io_clk,12 Communication Protocol for Padded Messages Without Message Splitting msg_len_ap | last = 1 message msg_len_bp msg_len_ap – message length after padding [bits] msg_len_bp – message length before padding [bits] w bits11/16/10 7 13 Communication Protocol for Padded Messages With Message Splitting . . . seg_0_len_ap | last=0 seg_0 w bits seg_1_len_ap | last=0 seg_1    seg_n-1_len_ap | last=1 seg_n-1 seg_n-1_len_bp seg_i_len_ap – segment i length after padding* [bits] seg_i_len_bp – segment i length before padding [bits] * For all i < n-1 segment i length after padding is assumed to be a multiple of the message block size, b [characteristic to each function], and thus also the word size, w. The last segment cannot consist of only padding bits. It must include at least one message bit. Performance Metrics11/16/10 8 15 Performance Metrics - Speed Throughput for Long Messages [Mbit/s] Throughput for Short Messages [Mbit/s] Execution Time for Short Messages [ns] Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries) 16 Performance Metrics - Speed Time to hash N blocks of message [cycles] = Htime(N) The exact formula from analysis of a block diagram, confirmed by functional simulation. Minimum Clock Period [ns] = T From a place & route and/or static timing analysis report file.11/16/10 9 17 Time to Hash N Blocks of the Message [clock cycles] 18 Performance Metrics - Speed Minimum time to hash N blocks of message [ns] = Htime(N)⋅T Maximum Throughput (for long messages) T * (Htime(N+1) - Htime(N)) block_size block_size =T * block_processing_time =Effective maximum throughput for short messages:11/16/10 10 19 Performance Metrics - Speed Maximum Throughput (for long messages) =block_size T * block_processing_time from specification from place & route report and/or static timing analysis report from analysis of block diagram and/or functional simulation 20 Performance Metrics - Area For the basic, folded, and unrolled architectures, we force these vectors to look as follows through the synthesis and implementation options: 0 0 0 0 Areaa11/16/10 11 21 Primary Optimization Target: Throughput to Area Ratio Features: • practical: good balance between speed and cost • very reliable guide through the entire design process, facilitating the choice of  high-level architecture  implementation of basic components  choice of tool options • leads to high-speed, close-to-maximum-throughput designs Choice of Optimization Target 22 Our Design Flow Specification Interface Datapath Block diagram Controller ASM Chart VHDL Code Formulas for Throughput & Hash time Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages Controller Template Library of Basic Components11/16/10 12 23 How to compare hardware speed vs. software speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) – Time(2048) 2048 Performance for long message = 23 24 How to compare hardware speed vs. software speed? Throughput [Gbit/s] = Performance for long message [cycles/byte] 8 bits/byte ⋅ clock frequency [GHz]

View Full Document