Mathematical BackgroundMathematical Background (cont.)Mathematical Background (cont.)ApplicationElliptic Curve Digital Signature Algorithm (ECDSA)System OverviewMultiply ArchitectureModular Inverse ArchitecturePoint Adding Algorithmwith Dual Multiply ArchitecturePoint Doubling Algorithmwith Dual Multiply ArchitectureProjective Coordinate Conversion-1with Dual Multiply ArchitectureComparison of ArchitecturesFunctional Block TestingA General Approach to Elliptic Curve Cryptosystems Over GF(2m) in Polynomial BasisbyN. Nguyen, P. Sarvadevabhatla, S. Katikaneni11/20/2002Elliptic Curve Cryptosystems - ECCAdvantages• first true alternative for RSA• several times shorter keys• fast and compact implementations, in particular in hardware• a family of cryptosystems, instead of a single cryptosystemThree Classes of Elliptic CurvesElliptic curves built overK = GF(p)K = GF(2m)Polynomial basisrepresentationNormal basisrepresentationArithmetic operationspresentin many librariesFast and Compact in hardwareBasic operations of Elliptic Curve Cryptosystems (1)Operations in the Galois Field GF(2m) in the normal basis representation:m=112..512• addition and subtraction: x+y, x-y• multiplication: x ⋅ y• squaring: x2• inversion: x-1Basic operations on points of an Elliptic Curveover the Galois Field GF(2m):• addition of points: P + Q• doubling a point: 2 Pwhere P = (xP, yP), Q = (xQ, yQ)Mathematical BackgroundElliptic Curve Over GF(2m)Under trinomial non-supersingular elliptic curve, there is a set of solution (x,y) to the equation:y2+ xy = x3+ a2x2+ a6 wherex, y ∈ GF(2m)a2 ∈ {0,1}a6∈ GF(2m)* A special point called the point at infinity OPoints addition on non-supersingular elliptic curve over GF (2m)P = (x1,y1) Q = (x2,y2)R = P + Q = (x3,y3)Case 1:P + O = O + P = PCase 2: if x2 = x1, y2 = y1+x1P + Q = OQ = -PMathematical Background (cont.)Case 3a:if P ≠ Q [P+Q]x3= λ2+ λ + x1+ x2+ a2y3= λ(x1-x3) - y1whereλ= (y1 + y2)(x1 + x2)-1Number of field operations:1 inversion2 multiplication1 squaringCase 3b:if P = Q [P+P=2P]x3= a6(x1-1)2+ x12y3= x12+ (x1+ y1x1-1)x3+ x3Number of field operations:1 inversion3 multiplication2 squaringMathematical Background (cont.)Equation: Q = P;for ( i=m-1 downto 0 )Q = 2Q; if( ki= 1 )Q = Q + P;end for;R = Q;Scalar Point MultiplyInput:P ∈ a point on curve GF(2m)k ∈ GF(2m)Output:R = k*PWe want to keepP = (x1,y1,z1) [projective coordinate]Q =(x2,y2,1) [affine coordinate]ApplicationElliptic Curve Digital Signature Algorithm (ECDSA)Signature VerificationSignature GenerationSecure Hash Algorithm-1 Secure Hash Algorithm-1DSA Sign OperationDSA Verify OperationMessage MessageMessage Digest Message DigestPrivate KeyPublic KeyDigital SignatureSignature verifiedFail/PassDigital SignatureSystem Overview• Utilize a central ALU architecture• ALU has 2 multiply and 1 inversion blocks• 1 control unit that acts as scheduler and control the bus• Utilize set of hardwired instructions• Parallelism Point PK2*mmResult2*mProjective Coordinate ConversionProjective Coordinate Conversion-1Point AdderPoint DoublingALU2x Multiply 1x InversionIrreducible PolynomialConstant PControl Unit & SchedulerMultiply ArchitectureInput: A(x), B(x) ∈GF(2m)Output: C <= A*B mod P1. C <= 02. for i = m-1 to 0 do3. C <= C*x + A*bi4. C <= C + cm*P5. end for6. return CModular Inverse ArchitectureModified Almost Inverse AlgorithmInput: A(x) ∈ GF(2m)Output: C <= A-1mod P1. Y<=A, D<=P, B<=0, Z<=12. loop3. while y0= 0 do4. Y<=Y*x-1X<=(X + z0*P)*x-15. end while6. if (Y=1)7. return Z8. if (D>Y) then9. D<=>Y, B<=>Z10. Y<=Y+D, Z<=Z+B11. end loopPoint Adding Algorithmwith Dual Multiply ArchitectureInput: P (x1,y1,z1), Q (x2,y2)Output: Result (x3,y3,z3) = P + Q If P,Q ≠ O, P ≠ Q, P ≠ -Qx3= ADy3= CD + A2(Bx1+ Ay1)z3= A3z1whereA = x2z1+ x1B = y2z1+ y1C = A + BD = A2(A + a2z1) + z1BCA2* (Ay1 + Bx1)7A2(Bx1+Ay1) +CD => y3B * x1A * y16Reg[B] = C * DA * D => x35A2(A+a2z1) + z1BC = D(BC) * z1A2 * (A + a2z1)4B * CA2 *(Az1) => z33A * z1Reg[A2] = A * A2Reg[B] = y2 * z1+ y1Reg[A] = x2 * z1+x11Point Doubling Algorithmwith Dual Multiply Architecture( + ) => y3A* B => x37B * (x12+y1z1+A)x14 *A6Reg[B] = x14+ a6z14z14* a6x12* x125z12* z12x1 * x14z1 * z1A2* (Az1) => z33A * z1A * A2Reg[Bx]=x12+A+y1z1y1 * z1Reg[A] = x1 * z11Input: P (x1,y1,z1)Output: Result (x3,y3,z3) = 2*Px3= ABy3= x14A + B(x12+ y1z1+ A)z3= A3z1whereA = x1z1B = a6z14+ x14Projective Coordinate Conversion-1with Dual Multiply ArchitectureInput: P (x1,y1,z1)Output: Result (x3,y3) = P (x1 /z1,y1 /z1,1)x3= x1 (z1)-1y3= y1 (z1)-11z1-12x1* (z1)-1y1 * (z1)-1Comparison of ArchitecturesGF(2m) Dual Multiply Single MultiplyPoint Adding 7 multiply rounds3 intermediate registersTotal clocks: 7*(m+1)Speed up: 1.8613 multiply rounds4 intermediate registersTotal clocks: 13*(m+1)Straight forward approachPoint Doubling 7 rounds3 intermediate registersTotal clocks: 7*(m+1)Speed up: 1.8613 multiply rounds4 intermediate registersTotal clocks: 13*(m+1)Projective Coordinate Conversion-11 multiply round1 modular inverse round0 intermediate registersTotal clocks: m+inv+12 multiply rounds1 modular inverse round1 intermediate registersTotal clocks: 2*m+inv+2*Other architectures: 3 multiply = 2.6 (1.4), 4 multiply = 3.25 (1.75)VIVA™ and HAL-15• VIVA is a software that offers a solution to hardware design• Acts as all-in-one complier, simulation and synthesis tool• HAL-15 is a hyper computer that VIVA runs on• Has 9 FPGA boards, XCV-4062Functional Block Testing• VIVA results are taken under HAL-15 machines that host a total of 9 FPGA boards, XCV-4062. It is compared to regular VHDL coding approach. • Area analysis takes in the accounts of additional input and output registers. Timing analysis is the real timing between registers and block under test. • Area Analysis takes in the account of registers outside of testing blockFPGA BoardBlock under testInput A Input BResultIrreducible Polynomialm32/64mmmInput register_select register_ena2depends on mavailable1ABPResultTiming AnalysisResult Analysis: MultiplierVHDL codem=128VIVA codem=128Pre-optimized version Synthesis tool: Synplify ProPackage: Xilinx 4062Area: 362 CLB (15%)Speed: 56 MHz (17.8 ns)Synthesis tool: VIVA 2.1 Package: Xilinx 4062Area: 590 CLB (26%)Speed:Timing analysis 23 MHz (43.4 ns)Hardware runs passed @ 27 MHz,
View Full Document