CryptoManiac Slides borrowed with permission from Todd Austin and Lisa Wu University of Michigan Advanced Computer Architecture Laboratory Architecture s Diminishing Return Staples of value we strive for High Speed Low Power Low Cost Tricks of the trade Faster clock rates via pipelining Higher instruction throughput via ILP extraction Strong evidence of diminishing return PIII vs P4 22 less P4 inst throughput 0 35 vs 0 45 SPECInt MHz Less return less value 1 A Powerful Solution Eschew Generality Speed Efficiency H W designs Flexibility Programmability Application Specific Processor General Purpose Processors ISA Extensions General Purpose Processors Specialization limits the scope of a device s operation Produces stronger properties and invariants Results in higher return optimizations Programmability preserves the flexibility regarded by GPP s A natural fit for embedded designs Where application domains are more likely restrictive Where cost and power are 1st order concerns Cryptography Definitions encryption vs decryption public key cipher vs secret key cipher Public secret key ciphers are the most commonly used plain text f x ciph ertext Public K ey plain text g x Private K ey g x plain text Private K ey ciph ertext g x plain text Private K ey 2 SSL Session Breakdown Focus Secret Key Ciphers client server SSL Characterization by Session Length authenticate 100 private key 90 Relative Contribution to Run Time public https get https recv private 80 70 60 Public Other Private 50 40 30 20 10 0 1k close 2k 4k 8k 16k 32k SSL Session Length bytes average size of a single web object 21k Benchmark Suite Cipher Key Size Blk Size Rnds Blk Author Application 3DES Blowfish IDEA Mars RC4 RC6 Rijndael Twofish 112 128 128 128 128 128 128 128 SSL SSH Norton Utilities PGP SSH AES Candidate SSL AES Candidate AES Standard AES Candidate 64 64 64 128 8 128 128 128 48 16 8 16 1 18 10 16 CryptSoft CryptSoft Ascom IBM CryptSoft RSA Security Rijmen Counterpane 3 Cipher Throughput Analysis 350 00 Alpha 21264 4W DF 300 00 250 00 200 00 150 00 100 00 50 00 0 00 Blowfish 3DES IDEA Mars RC4 RC6 Rijndael Twofish Alpha 21264 vs 4W All except Mars and Twofish were within 10 of the actual machine tests Mars 11 Twofish 15 Alpha 21264 vs DF Blowfish IDEA and RC6 are running within 20 of DF performance Mars 29 Twofish 76 RC4 and Rijndael are outliers Characteristics of Cipher Kernels Diffusion goal of cryptography Goal is to randomly impress upon each group of output bits some information from each of the input bits Process needs to be reversible Should result in a random perturbation of each output bit with a probability 50 Cipher kernel loops run about 16 times on each block of data mixing the data more an more reach round Cipher kernels have very little to no parallelism Usually a very long recurrence 4 Breakdown of Cipher Operations Rotates Rotate the bits in a register Modular Addition Modular Multiplication 2 N 1 prime modulus operations Substitutions Table based substitutions SBOX a table of values indexed with plaintext a byte that produces the result of the key parameterized function General Permutations XBOX map N bits onto N buts with any arbitrary exchange of individual bits Blowfish Cipher Kernel for ii 0 ii BF ROUNDS ii register BF LONG tmp r p ii 1 r s int l 24L sbox 0x0100 int l 16L 0xff sbox 0x0200 int l 8L 0xff sbox 0x0300 int l 0xff 0xffffffffL tmp r r l l tmp r p BF ROUNDS 1 5 Cipher Bottleneck Analysis Analysis of Bottlenecks in Cipher Kernels 1 0 9 0 8 0 7 Alias Branch Issue Mem Res Window All 0 6 0 5 0 4 0 3 0 2 0 1 Alias impact of stalling loads in the pipeline until all ealier store addresses have been resolved Branch effects of mispredictions Issue impact of reducing issue width Mem impact of introducing a realistic memory system Res impact of limited functional unit resources Window impact of a limited size instruction window 0 3DES Mars RC4 Rijndael Twofish Cipher Relative Run Time Cost Focus Kernel Loop 100 90 Blowfish 80 3DES IDEA 70 60 Mars RC4 RC6 50 40 Rijndael Twofish 30 20 10 0 16 64 256 1k 4k 16k Session Length in bytes 64k 256k 1M 3DES and IDEA are small even for 16 byte sessions Mars RC4 RC6 Rijndael and Twofish drop well below 10 for 4k byte sessions Blowfish is outlier drops below 10 only for 64k byte sessions 6 Cipher Kernel Characterization Characterization of Cipher Kernel Operations 100 90 80 70 60 50 40 30 20 10 Branch Mov Ld St Xbox Sbox Mult Rotates Logical Arith SBOX substitutions XBOX permutations IDEA Mars RC4 and RC6 rely on arithmetic computations benefit from more resources multiplies and from faster operations rotates Blowfish 3DES Rijndael and Twofish rely on substitutions benefit from increased memory bandwidth and accesses 0 Architectural Extensions All instructions are limited to two register input operands and one register output ROL and ROR rotates for 64 and 32 bit data types ROLX and RORX support a constant rotate of a register input followed by an XOR with another register input MULMOD computes the modular multiplication of two register values modulo the value 0x10001 SBOX speeds the accessing of substitution tables with 256entry tables and 32 bit contents SBOXSYNC synchronize the SBOX table with memory XBOX implements a portion of a full 64 bit permutation 7 Crypto Specific ISA frequent SBOX substitutions X sbox y c 0xff 31 X sbox m j c 1 SBOX instruction eliminates address generation All SBOX tables are aligned to a 1k byte boundary Address generation becomes zero latency bit concatenation Stores to SBOX storage are not visible by later SBOX s until An SBOXSYNC is executed An alias bit is set SBOX instruction Incorporates byte extract Speeds address generation Original 4 cycle operation becomes a 1 cycle CryptoManiac instruction Table 10 Index 0 24 16 8 0 opcode 00 SBOX Table Crypto Specific ISA cont Ciphers often mix logical arithmetic operation Excellent diffusion properties plus resistance to attacks ISA supports instruction combining Logical ALU op ALU op Logical Eliminates dangling XOR s Reduces kernel loop critical paths by nearly 25 Small 5 increase in clock cycle time 8 Performance of ISA Extensions 4 5 Orig 4W Opt 4W Opt 4W Opt 8W Opt DF 4 3 5 3 2 5 2 1 5 1 0 5 0 Blowfish 3DES IDEA Mars RC4 RC6 Rijndael Twofish CryptoManic ISA bundle inst inst inst inst inst operation pair dest operand 1 operand 2 operand 3 operation pair short tiny tiny short tiny tiny long nop tiny xor and inc signext nop short add sub rot sbox nop long mul mulmod
View Full Document
Unlocking...