CS 152 Computer Architecture and Engineering Lecture 19 Real Processor Walkthru II 2004 11 04 Dave Patterson www cs berkeley edu patterson John Lazzaro www cs berkeley edu lazzaro www inst eecs berkeley edu cs152 CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 1 Last Time Leon an open source SPARC 0 35 65 MHz 40 mm Removal of FPU would reduce area power cycle time CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 2 Today Focus on Leon s multiplier Configurable Leon offers 5 multiplier design options Mapping on FPGAs uses built in multiplier and fast adder resources Final Project All groups must add a multiplier one design option is a fast multiplier CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 3 SPARC Unsigned Multiply UMUL General purpose 32 bit registers 13 bit inline constant UMUL reg rs1 reg rs2 or immed reg rd reg rs1 reg rd 32 64 reg rs2 32 LSBs 32 MSBs 32 Y register Q Why use GP register for LSB destination CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 4 Basic concept of multiplica Recall Unsigned Multiply Algorithm MULTIPLY unsigned Paper and pencil example unsigned Multiplicand Multiplier Facts to remember Product 1000 1001 1000 0000 5 6 7 0000 1000 01001000 m bits x n bits m n bit product Binary makes it easy 0 place 0 1 place a copy 0 12 3 0 2 0 412 0product of 2 n bit numbers is an 2n b x multiplicand sum of n n bit partial products 1 x multiplicand unsigned 4 versions of multiply hardware algorithm CS 152 L19 Real Processor Walkthru II successive refinement UC Regents Fall 2004 UCB 5 Design 1 Spatially compute A x B P 1 bit signals x y z s Cin Cout x y z Cin Cout s If z 1 Cout s x y Cin If z 0 Cout s y Cin CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 6 2 11 03 UCB Spring 2004 Lec6 21 Array to compute A x B P y z Q Number of clock cycles Unsigned Combinational Multiplier x 00 Q Clock cycle time Critical path A3 00 A2 A1 00 A0 B0 Cout Q Can we pipeline What do we give up A3 A2 A1 A0 B1 Cout A3 Q What is the downside to a spatial design 00 A2 A1 A0 B2 Cout A3 A2 A1 A0 B3 Cout P7 P6 P5 P4 P3 P2 P1 P0 Stage i accumulates A 2 i if Bi 1 CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB Q How much hardware for 32 bit multiplier Critical path 2 11 03 UCB Spring 2004 7 CS152 Kubiatow Lec6 23 Administrivia Lab 4 and Homework 4 11 5 this Friday Lab 4 milestone demo in section HW 4 due Weds 11 10 5PM 283 Soda Problem list is now complete no new problems were added 11 12 next Friday Lab 4 final demo in section 11 15 following Monday Lab 4 final report due 11 59 PM CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 8 Administrivia Mid term and Field Trip Mid Term II Review Session Sunday 11 21 7 9 PM 306 Soda Mid Term II Tuesday 11 23 5 30 to 8 30 PM 101 Morgan Thanksgiving Holiday Xilinx field trip date 11 30 Details on bus transport from Soda Hall soon CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 9 Unisigned shift add multiplier version 1 of 4 integers C2 B2 A2 I1 C1 B1 A1 Design 2 Sequentially compute A x B P I2 I3 I1 Carry Save Adder 3 2 S0 S1 C 0 B0 A0 I2 I1 I3 64 bit Multiplicand reg 64 bit ALU 64 bit Product reg 32 bit multiplier reg I3 S0 S1 S0 S1 I2 Carry Save Adder 3 2 Carry Save Adder 3 2 Recall Mini Lab 2 3 Multiplication The type of Dmultiplier you will D0 is a 32 cycle multiplier which uses a shared D1 be debugging 2 0 product multiplier register like the multiplier seen in Figure 3 7 of COD I1 I2 I3 I1 Carry Save Adder 3 2 I1 I3 S1 S0 S1 I I 2 3 Idle State Carry Save Adder 3 2 Carry Save Adder 3 2 S0 S1 I2 Shift Left Multiplicand 64 bits Multiplier S0 64 bit ALU 0 I1 I2 S1 I1 I3 Carry Save Adder 3 2 I2 I3 I1 Carry Save Adder 3 2 S0 S4 S3 UCB Spring 2004 S1 S1 S0 S1 Multiplier0 1 Product Write Control 64 bits Load m ultiplicand and m ultiplier Multiplier datapath control S0 Q Why is area so small Q Clock cycle time Q Does pipelining this Observations on Multiply Version 1 design make sense 1 clock per cycle 100 clocks per multiply Q What is the downside Ratio of multiply to add 5 1 to 100 1 of this design 1 2 bits in multiplicand always 0 64 bit adder is wasted FPGAs support continuum spatial to sequential CS152 Kubiatowicz BLec6 25 usy State M ultiplier 0 1 ersion 1 1 I3 Carry Save Adder 3 2 S0 S2 I2 32 bits 0 start Shift Right Test M ultiplier 0 2 11 03 CS152 Kubiatowicz Lec6 26 Multiplier 0 0 Add m ultiplicand and shift product register Shift product register Start 1 Test Multiplier0 UCB Spring 2004 No 32 repetitions Multiplier0 Done 0 Yes 32 repetitions 4 Lab In thismultiplicand lab you will beto debugging completed multiplier in simulation The code contains a total of 4 Add producta bugs After using test benches we give you to find the first two you will write a test bench to discover place the result in Product register the last two Start by copying M mini lab2 to your home directory and opening the Project Navigator project file CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 0 s inserted in right of multiplicand as shifted 10 nd 4 1 Unit Tests least significant bits of product never changed once 10 1 2 Find and then fix the bug in the adder left Adder tb v Shift the Multiplicand register 1 bit is the associated test bench formed 10 2 Find and then fix the bug in the product register Productreg tb v is the associated test bench 00 Recall Leon Configuration GUI Five options for multiplier latency 1 cycle option is fully spatial 35 cycle is mini Lab 2 2 4 5 cycles CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 11 Trick State machine small spatial array Multiplier Multiplicand Specialized ALU with small array multiplier State machine control Product accumulator Q With this general architecture what needs to be configured in Leon VHDL to trade off space and latency CS 152 L19 Real Processor Walkthru II UC Regents Fall 2004 UCB 12 8 windows should be used The multiplier option selects how the multiply instructions are im From Leon …
View Full Document
Unlocking...