##
This **preview** shows page *1-2-3-4*
out of 11 **pages**.

*View Full Document*

End of preview. Want to read all 11 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

Bit Permutation Instructions for Accelerating Software CryptographyZhijie Shi, Ruby B. LeeDepartment of Electrical Engineering, Princeton University{zshi, rblee}@ee.princeton.eduAbstractPermutation is widely used in cryptographic algorithms. However, it is not well-supportedin existing instruction sets. In this paper, two instructions, PPERM3R and GRP, are proposedfor efficient software implementation of arbitrary permutations. The PPERM3R instructioncan be used for dynamically specified permutations; the GRP instruction can be used to doarbitrary n-bit permutations with up to lg(n) instructions. In addition, a systematic method fordetermining the instruction sequence for performing an arbitrary permutation is described.1. IntroductionSecure information processing using cryptography is becoming increasingly important.Confusion and diffusion are two basic techniques used in cryptography algorithms [1].Diffusion spreads the redundant information in the plain text over the cipher text. As aprimary method to achieve diffusion, permutation is widely used in cryptographic algorithms[1]. For example, there are six different permutations in DES [1], two permutations in Twofish[8], and two permutations in Serpent [11]. Although it is not a problem for hardware toimplement a pre-defined permutation [3], data-dependent permutations are not easy.Furthermore, there is no efficient way to do arbitrary permutations in software on mostexisting processor architectures.Currently, several methods are used to perform permutation operations in software. Themost straightforward method moves bits one by one [1]. We fetch one bit from the source, andput it into the correct position in the destination. We need 4n operations for an n-bitpermutation with this method on most existing microprocessors. The first two operationsgenerate the mask and extract a bit with an AND instruction, the third moves the bit to thecorrect position with a SHIFT instruction, and the fourth sets the correct bit in the destinationregister with an OR instruction. This process can be accelerated on systems havinginstructions like EXTRACT and DEPOSIT [5][6], but still require 2n operations. This methoddoes not require an extra functional unit, and uses a small amount of memory, but it isexpensive in execution time.Another widely used technique is table lookup [4]. In this method, the source is divided intoseveral sections. Then, the bits in each section are permuted simultaneously by looking up atable. Finally, we combine the result of each section to obtain the result of the permutation.The number of instructions in this method is dependent on how many sections we divide thesource into. Fewer sections require fewer instructions. However, fewer sections also lead tomore memory. For example, suppose we want to do a 64-bit permutation, we could do it with0-7695-0716-6/00 $10.00 ã 2000 IEEEone table lookup by putting all possible results in a large table of 264*8 bytes, which is toolarge to be feasible. Alternatively, we can divide the source into 8 sections, and build 8 tables.Each table has 2K bytes (256 entries and 8 bytes in each entry). Only 8 bits are permuted inone table lookup. Each table entry has 64 bits, and bits not permuted with that table are set to0. Including the instructions for extracting indexes, we need 8 EXTRACTs [5][6], 8 tablelookups and 7 ORs for one 64-bit permutation. This method is faster but consumes too muchmemory. The comparison of these two methods is shown inTable 1.Table 1: Traditional methods to do a 64-bit permutation on 64-bit systemsNumber of instructions Memory neededAND,SHIFT,OR256 (4n)0Table Lookup23 (8 EXTRs, 8 LOADs and 7 ORs,) 16Kbytes (8 2K-byte tables)Smart methods exist for some regular permutations. For example, the initial permutation inDES is similar to a transpose of bit matrices. It can be done in an efficient way with 30operations on a 32-bit platform [1], rather than 4n = 256 instructions as in the first methoddescribed above. However, not all permutations are regular. Even for regular permutations,we do not have a systematic strategy to find the shortest sequence of instructions for anarbitrary permutation, which implies that we have to attack each permutation separately andcan not guarantee finding the fastest solution.Due to the difficulty in the software implementation of permutations, complex permutationoperations are avoided when designing new algorithms. In this paper, we propose two novelpermutation instructions that can perform arbitrary permutations of n bits efficiently. Therehas been limited past work on general-purpose permutation instructions. In [7], Lee proposedthe first instruction-level support for general-purpose permutations in microprocessors: theMIX and PERMUTE instructions. One PERMUTE instruction can generate in a single cycleany permutation of four 16-bit subwords, including permutations with repetition of elements.The MIX instruction can combine the contents of two registers. If we extend it to handle 1-bitsubwords, it can perform many regular permutations efficiently, such as the initial permutationin DES. In [2], Hansen proposed the group-8-mux and group-transpose-8-mux instructions.However, their instructions do 64-bit permutations on 128-bit systems, and require more thantwo operands. By proposing two general-purpose permutation instructions in this paper forarbitrary bit-level permutations, we hope to give cryptography algorithm designers moreflexibility in designing faster and more secure encryption algorithms. Our permutationtechniques can be implemented in any programmable processor, whether they are general-purpose microprocessor or application-specific cryptography processors. This paper isorganized as follows: Section 2 provides some mathematical properties of permutations.Section3andsection4introducetwonewpermutationinstructions,PPERM3RandGRP,respectively. Section 5 compares the performance of these two permutation methods with thetable lookup method. Section 6 concludes the paper.2. Mathematical properties of permutationsBefore we get into the details of the instructions, let us take a look at the mathematicalproperties of permutations. The number of n-bit permutations without repetitions is!n .According to Stirling’s approximation [9][10],0-7695-0716-6/00 $10.00 ã 2000 IEEE)10,0(2!1221<<>=+−+θπθnennnnn(1)When doing an n-bit permutation, we need to choose one result out of n! possibilities. Atleast, we should be able to distinguish

View Full Document