Hardware Implementations of RSA Using Fast Montgomery MultiplicationsOverviewIntroductionIntroductionFunctional SpecificationFunctional SpecificationFunctional SpecificationFunctional SpecificationFunctional SpecificationFunctional SpecificationImplemented DesignImplemented DesignImplemented DesignOptimizationsToolsTestingTestingResultsResultsResultsResultsConclusionsConclusionsQuestionsHardware Implementations of RSA Hardware Implementations of RSA Using Fast Montgomery MultiplicationsUsing Fast Montgomery MultiplicationsECE 645ECE 645Prof. GajProf. GajMike Koontz and Ryon SumnerMike Koontz and Ryon SumnerOverviewOverview¾¾IntroductionIntroduction¾¾Functional SpecificationsFunctional Specifications¾¾Implemented Design and OptimizationsImplemented Design and Optimizations¾¾ToolsTools¾¾TestingTesting¾¾ResultsResults¾¾ConclusionsConclusionsIntroductionIntroduction¾¾RSA Encryption / DecryptionRSA Encryption / Decryption••Worldwide use in securing data transmissionWorldwide use in securing data transmission••Public / private key basedPublic / private key based••Large (512Large (512--bit +) keys required for protection of databit +) keys required for protection of data••Large keys = Slower decryption timesLarge keys = Slower decryption timesRSAEncryptionMCypherDataRSADecryptionCypherDataPublic Key MPrivate KeyMHackerXAlice BobIntroductionIntroductionPicture taken from “Lecture 11 – Exponentiation, Multi-Precision Arithmetic in Software”. George Mason University. Prof Gaj.http://teal.gmu.edu/courses/ECE645/viewgraphs_S06/Lecture11_EXP_SW_2.pdf pp 2.Functional SpecificationFunctional Specification¾¾RSA Encryption / Decryption AlgorithmRSA Encryption / Decryption AlgorithmzzTo calculate Y = XTo calculate Y = XEEmod N:mod N:S = XS = XY = 1Y = 1for (i = 0 to kfor (i = 0 to k--1)1){{if (Eif (Ei i = 1)= 1)Y = Y * S mod NY = Y * S mod NS = S * S mod NS = S * S mod N}}Functional SpecificationFunctional SpecificationPicture taken from “Lecture 11 – Exponentiation, Multi-Precision Arithmetic in Software”. George Mason University. Prof Gaj.http://teal.gmu.edu/courses/ECE645/viewgraphs_S06/Lecture11_EXP_SW_2.pdf pp 21.Functional SpecificationFunctional SpecificationPicture taken from “Lecture 11 – Exponentiation, Multi-Precision Arithmetic in Software”. George Mason University. Prof Gaj.http://teal.gmu.edu/courses/ECE645/viewgraphs_S06/Lecture11_EXP_SW_2.pdf pp 22.Functional SpecificationFunctional Specification¾¾Montgomery MultiplicationMontgomery MultiplicationzzMP (A, B, N)MP (A, B, N)zzAlgorithm:Algorithm:S[0] = 0;S[0] = 0;for i in 0 to kfor i in 0 to k--1 loop1 loopqqi i = (S[i]= (S[i]00+ A+ Aii* B* B00) mod 2;) mod 2;S[i + 1] = (S[i] + AS[i + 1] = (S[i] + Ai i * B + q* B + qi i * N) / 2;* N) / 2;end loop;end loop;return S[k];return S[k];Functional SpecificationFunctional Specification¾¾55--2 Montgomery Multiplication2 Montgomery MultiplicationzzMP52 (A1, A2, B1, B2, N)MP52 (A1, A2, B1, B2, N)zzAlgorithm:Algorithm:S1[0] = 0; S2[0] = 0;S1[0] = 0; S2[0] = 0;for i in 0 to kfor i in 0 to k--1 loop1 loopqqii= (S1[i]= (S1[i]00+ S2[i]+ S2[i]00) + (A) + (Aii* ( B1* ( B100+ B2+ B20 0 ) ) mod 2;) ) mod 2;S1[i+1],S2[i+1] = CSR(S1[i] + S2[i] + AS1[i+1],S2[i+1] = CSR(S1[i] + S2[i] + Aii* (B1 + B2) + q* (B1 + B2) + qii* N) / 2;* N) / 2;end loop;end loop;return S1[k], S2[k];return S1[k], S2[k];Functional SpecificationFunctional Specification¾¾RSA Encryption / Decryption with 5RSA Encryption / Decryption with 5--2 MP2 MPzzRSA (C, d, N)RSA (C, d, N)K = 2K = 22k2kmod N;mod N;P1, P2 = 5to2_MontMult( K, 0, C, 0, N );P1, P2 = 5to2_MontMult( K, 0, C, 0, N );R1, R2 = 5to2_MontMult( K, 0, 1, 0, N );R1, R2 = 5to2_MontMult( K, 0, 1, 0, N );for i in 0 to dfor i in 0 to dkklooploopif d[i] = 1if d[i] = 1R1, R2 = 5to2_MontMult( R1, R2, P1, P2, N );R1, R2 = 5to2_MontMult( R1, R2, P1, P2, N );P1, P2 = 5to2_MontMult( P1, P2, P1, P2, N );P1, P2 = 5to2_MontMult( P1, P2, P1, P2, N );end loop;end loop;M1, M2 = 5to2_MontMult( 1, 0, R1, R2, N );M1, M2 = 5to2_MontMult( 1, 0, R1, R2, N );return M1 + M2;return M1 + M2;Functional SpecificationFunctional Specification¾¾Addition ChainsAddition ChainszzSequence of additions to produce a large numberSequence of additions to produce a large numberzzEach sequence step is the sum of two numbers Each sequence step is the sum of two numbers previously in the chainpreviously in the chainzze.g.e.g.••27 = 1, 2, 3, 6, 12, 24, 2727 = 1, 2, 3, 6, 12, 24, 27zzExpanded to sequence of multiplicationsExpanded to sequence of multiplications••XX2727= X= X11, X, X22, X, X33, X, X66, X, X1212, X, X2424, X, X2727Functional SpecificationFunctional Specification¾¾Addition ChainsAddition ChainszzUse memory (registers) to store intermediate resultsUse memory (registers) to store intermediate resultszzUse memory to store and serve addition chain Use memory to store and serve addition chain commands to multiplier circuitcommands to multiplier circuitzzCommand structure:Command structure:22LogLog22RRLogLog22RRLogLog22RRCCDestinationDestinationOperand2Operand2Operand1Operand1Functional SpecificationFunctional Specification¾¾RSA Shell Unit:RSA Shell Unit:RSA with Montgomery Multiplicationclock resetdata_avail data_in key_instart fulldata_read writedata_outFunctional SpecificationFunctional Specification¾¾RSA Chain Shell Unit:RSA Chain Shell Unit:Addition Chain RSA with Montgomery Multiplicationclock resetdata_avail data_instart fulldata_read writedata_outcommand_incommand_availcommand_read readyImplemented DesignImplemented Design¾¾Criteria:Criteria:zzMaximize ThroughputMaximize ThroughputzzMinimize Clock PeriodMinimize Clock PeriodzzMinimize Area (on selected chips)Minimize Area (on selected chips)¾¾RSA considerations:RSA considerations:zzEncryption is trivialEncryption is trivialzzDecryption is bottleneck for RSA processDecryption is bottleneck for RSA processzzHigh throughput allows for more decryptions in High throughput allows for more decryptions in shorter amount of timeshorter amount of timeImplemented DesignImplemented Design¾¾Design architectures:Design architectures:zzSequential multipliersSequential multipliers••Small area, incremental results, small latency / roundSmall area, incremental results, small latency / roundzzTree and Array multipliersTree and Array multipliers••Large area, one result / clock cycle, pipelineLarge area, one result / clock cycle, pipeline--readyready¾¾ChoiceChoice: Sequential
View Full Document