1HI-1: XTEAHardware Implementation of XTEASteven AumackMichael Koontz2Project OutlineAlgorithm details about TEA & XTEASoftware ImplementationHardware ImplementationBlock DiagramOptimize for Speed, not sizeSimulation and,Timing Analysis3Background InformationCrypto System ParametersSmaller code sizeLess complexityTEABlock Cipher (64-bit block size)Feistel Structure (64 rounds or 32 cycles)128-bit keyXTEAExtension of TEA (Tiny Encryption Algorithm)Correct weaknesses of TEAAlgorithm Designers (Cambridge Computer Laboratory)David WheelerRoger Needham4TEA vs XTEA+ +<< 4>> 5+DeltaSubkey A+ +<< 4>> 5+DeltaSubkey B++++<< 4>> 5DeltaK[0]K[1]++++<< 4>> 5DeltaK[2]K[3]TEA XTEA5XTEASoftware Implementationtean( long * v, long * k, long N) {unsigned long y=v[0], z=v[1], DELTA=0x9e3779b9 ;if (N>0) { /* coding */unsigned long limit=DELTA*N, sum=0 ;while (sum!=limit){y+= (z<<4 ^ z>>5) + z ^ sum + k[sum&3];sum+=DELTA;z+= (y<<4 ^ y>>5) + y ^ sum + k[sum>>11 &3];}}else { /* decoding */unsigned long sum=DELTA*(-N) ;while (sum){z-= (y<<4 ^ y>>5) + y ^ sum + k[sum>>11 &3];sum-=DELTA;y-= (z<<4 ^ z>>5) + z ^ sum + k[sum&3] ;}}v[0]=y, v[1]=z ;}XTEAEncryptionXTEADecryption6XTEAHardware ImplementationFPGADevelopment Environment, Simulator, and Timing Analysis Active-HDL ver 7.1Synthesis Synplicity Synplify Pro 8.5Implementation Xilinx ISE ver 8.1ASICSynopsys Design Analyzer Ver X-2005.097XTEASpeed Optimization1. Optimize Adder•Ripple Carry Adder Default adder implemented by design tools•Kogge-Stone Adder Parallel prefix adder2. Sequential vs Pipeline•Original Implementation ROM & look-up tables•Final Implementation Pipelined8Hardware Block DiagramXTEAKEY128 bitCLOCK1 bitDATA IN64 bitRESET1 bitENC_DEC1 bitLOAD_DATA1 bitLOAD_KEY1 bitREADY1 bitOUTPUT64 bit9XTEAHW Block Diagram - ENCRYPTIONY(V0) Z(V1)<< 4 >> 5++PL_1+DeltaSumSubKey++PL_1 PL_1(z<<4 ^ z>>5) + zsum + k[sum & 3]y + ( (z<<4 ^ z>>5) + z) ^ (sum + k[sum & 3] )sum = sum + delta+10XTEAHW Block Diagram – ENCRYPTION (cont.)PL_2PL_2SubKey+PL_3 PL_3+<< 4 >> 5++PL_3Z(V1)+(y<<4 ^ y>>5) + ysum + k[sum>>11 & 3]z + ( (y<<4 ^ y>>5) + y) ^ (sum + k[sum>>11 & 3] )PL_3New yNew sum11XTEAHW Block Diagram - DECRYPTIONY(V0)<< 4 >> 5++PL_1Delta SumSubKey+PL_1 PL_1(y<<4 ^ y>>5) + ysum + k[sum>>11 & 3]+Cin‘1’++Cin‘1’Z(V1)z – ( (y<<4 ^ y>>5) + y) ^ (sum + k[sum>>11 & 3] )sum = sum - delta12XTEAHW Block Diagram – DECRYPTION (cont.)PL_2 PL_2<< 4 >> 5++PL_3+PL_3+Cin‘1’SubKey+PL_3 PL_3Y(V0)sum + k[sum & 3](z<<4 ^ z>>5) + z(z<<4 ^ z>>5) + z) ^ (sum + k[sum & 3] )y – ( (z<<4 ^ z>>5) + z) ^ (sum + k[sum & 3] ) )New sum New z13XTEAKogge-Stone AdderPicture taken from “Lecture 5 – Conditional-sum adder, hybrid adders, parallel prefix network adders”. George Mason University. Prof. Gaj. http://teal.gmu.edu/courses/ECE645/viewgraphs_S06/lecture5_ppn_adders_2.pdf pp 13.14XTEAHW Block Diagram – STATE MACHINErst_sumreadyrst_countload_key?load_data?1001en_dataenc_data?enc_L1enc_L2enc_L3S2S1S3S4load_sumen_dataen_sumen_countS5done?en_outS6Reset101015ResultsTarget FPGAXilinx Virtex IV 4SX25FF668Speed Grade 12Chosen to eliminate area from considerationTarget ASIC90 nm TCBN90G TSMC Library16Results20,411Area(Total Equivalent Gate Count156.1776.4034,6081,081FPGAClock Frequency (MHz)Clock Period (ns)Area (LUTs)Area (Slice Flip-Flops)Device666.671.549,378.292969ASICClock Frequency (MHz)Clock Period (ns)AreaDevice73.33832.39FPGA313195ASICThroughput (Mbps)* Latency(ns)Device* Latency assumes Key and Key Schedule constant are already loaded17XTEATiming Analysis for FPGAEncryptionDecryption18XTEAFuture Work ConsiderationsIncrease Depth of PipelinePipeline each stage of KS AddersReduce critical pathAdd FIFO Memory for Larger FilesUsing multiple XTEA processing unitsProcess multiple blocks at a
View Full Document