Shrivathsa BhargavLarry ChenAbhinandan MajumdarShiva RamuditMay 10, 2008Spring 2008, Columbia UniversitySystem architectureNios II processorSRAM chipSD-card controller (SPI)SD-cardAES decryptoVGA controllerVGA monitorSRAM controllerKeyboard16x2 LCDPS/2 controllerLCD controllerAvalon BusSDRAM chipSDRAM controllerSD-Card SPI Interface The SD-Card SPI interface communicates with the MMC/SD card via SPI protocol The SPI interface interacts with the card through a sequence of commands such as reset, initialize, set block length, and data read request This interface was difficult to simulate and debug since the MMC/SD card protocol is proprietary Modified Professor Edwards’ SPI interface implementation from APPLE2FPGA Reduced duplicate reads Issuing 512-byte block reads causes buffer spill for consecutive frames A single frame is 77888 bytes, which is not divisible by 512-byte blocks A check in software is implemented to monitor the frames and offset it by 64*(frame % 8) to read the correct data contents The spill will be multiples of 64-bytes, and it will takes 512-byte/64-byte = 8 spills to go back to a 0-byte spill blockSD-Card SPI Interface Increased compatibility Applied a patch to send additional pulses to the SD to wake it up Increased wait clock cycles to successfully read consecutive blocks of data Increased performance Set block length to 512-bytes and correspondingly sized buffer to avoid issuing unneeded number of data read requestsAES Decryption10110010KEYCIPHER TEXTPLAIN TEXT AES (Advanced Encryption Standard) Decryption is a Symmetric Key Cryptographic Algorithm that accepts the cipher text and the key as input, and generates original text as output10101011101011000101110110101011101011000101AES DecryptoAES Decryption Algorithm Key Expansion Generates Intermediate Keys required for each iteration Inv Add Round Key XORs the generated key for that particular iteration with the cipher textINV ADD ROUND KEYINV SHIFT ROWINV MIX COLUMNINV SUB BYTESINV SUB BYTESINV SHIFT ROWINV ADD ROUND KEYPlain Text 9 timescipherkeyINV ADD ROUND KEYKEY EXPANSIONAES Decryption AlgorithmINV ADD ROUND KEYINV SHIFT ROWINV MIX COLUMNINV SUB BYTESINV SUB BYTESINV SHIFT ROWINV ADD ROUND KEYPlain Text 9 timescipherkeyINV ADD ROUND KEYKEY EXPANSION Inverse Shift Row Shifts each ithrow by ielements to the right Inv Sub-bytes Replaces each element by corresponding entry from inverse s-box Inv Add Round Key XORs the generated values by corresponding intermediate key to that iteration Inv Mix Column Performs modulo multiplication with MDS matrix in Rijndael's finite fieldAES Decryption AlgorithmINV ADD ROUND KEYINV SHIFT ROWINV MIX COLUMNINV SUB BYTESINV SUB BYTESINV SHIFT ROWINV ADD ROUND KEYPlain Text 9 timescipherkeyINV ADD ROUND KEYKEY EXPANSION Repeats these four steps for 9 iterations As a last iteration, it does inverse shift row, inverse sub-bytes and inverse add round key Final output is the plain textAES Key Expansion–RTL DesignKey expansion required to generate the roundkeys required for each round of encryptionGenerate roundkey module contains all combinational logic to perform the key expansion algorithmTakes 11 clock cycles to generate the 10 roundkeysKey Controllerclkstartkey128GENERATE ROUNDKEYMUXREGISTERWrite Controllerclk128Expansion keysMUX128key1284Write address4CountRound KeyeocAES Decrypto – RTL DesignTakes 10 clock cycles to generate the plain text. Runs at 88.31 MHz and occupies 17% of the FPGA Logic Elements.clkstartCipher/key32Input Buffer128INV SHIFT ROW / SUB BYTESKey TableMUXINV MIX COLUMNeocPlain data32Output Buffer 128INV S - BOXREGISTERMUX128INV ADD ROUND KEYDMUXMUXKey Expansionstartcipher 128-bitclkCipher 128-bit latchedcipher 32 bitTiming of Input Data Bufferingclk128-bit original dataeoc32 bit dataPlain 128-bit latched dataTiming of Final Data TraversalAES Key Expansion AlgorithmThe algorithm for generating the 10 rounds of the round key is as follows: The 4th column of the i-1 key is rotated such that each element is moved up one row.This result goes through forwards Sub Box algorithm which replaces each 8 bit value of this column with a corresponding 8-bit value.AES Key Expansion AlgorithmTo generate the first column of the ithkey, this result is exclusive-or-ed with the first column of the i-1thkey as well as a constant (Row constant or Rcon) which is dependent on i. RconThe second column is generated by exclusive-or-ing the 1st column of the ithkey with the second column of the i-1thkey.AES Key Expansion AlgorithmThis continues iteratively for the other two columns in order to generate the entire ithkey.Additionally this entire process continues iteratively for generating all 10 keys. All of these keys are stored statically once they have been computed as the ithkey generated is required for the (10-i)thround of decryption.SRAM controller Single-ported SRAM poses a problem Had to devise a GO/NO switch (Mux)SRAM chipVGA controllerVGA monitorSRAM controllerNios II processorVGA_GO!SRAM chipVGA controllerVGA monitorSRAM controllerNios II processorVGA_NO!VGA controller Bitmap specs 1078-byte header, 8-bit depth, flip row order Forcing grayscale (R=G=B=data) Address calculationVGA controller Reading VGA draw location constantly in software Writing into SRAM only when outside “rectangle” Reduced fps from 8.5 to 6!Summary Results 32% LE, 14% Memory, 3.74 Mbps throughput Lessons learned Technical knowledge Hardware behaviors are difficult to visualize without simulations Code reuse saves time and effort to design and debug Start early; Work on modularized tasks parallelly andconcurrently Original goals superseded by video Future work Color video (there’s enough memory) Higher frame-rate (overclock system) Double-buffering to remove scan
View Full Document