2/18/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec7.1CS152Computer Architecture and EngineeringLecture 7Designing a Single Cycle DatapathFebruary 18, 2004John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.2Review: Sequential Logic + (Non-)blocking assignmentsmodule FF (CLK,Q,D);input D, CLK;output Q; reg Q;always @ (posedge CLK) Q <= D;endmodule // FFGood: Doesn’t output until “after edge”Must be careful mixing zero-time blocking assignments and edge-triggering: Probably won’t do what you expect when connecting it to other things!module FF (CLK,Q,D);input D, CLK;output Q; reg Q;always @ (posedge CLK) Q = #5 D;endmodule // FFGood: Outputs 5 units “after edge”module FF (CLK,Q,D);input D, CLK;output Q; reg Q;always @ (posedge CLK) #5 Q = D;endmodule // FFProbably Not what you Expect:• Hold time of 5 units• glitches < 5 units ignored2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.3Review: MULTIPLY HARDWARE Version 3° 32-bit Multiplicand reg, 32-bit ALU, 64-bit Product reg (shift right), (0-bit Multiplier reg)Product(Multiplier)Multiplicand32-bit ALUWriteControl32 bits64 bitsShift Right“HI” “LO”2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.4Divide can use almost same hardware (From Book)° 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg)Remainder(Quotient)Divisor32-bit ALUWriteControl32 bits64 bitsShift Left“HI” “LO”° Multiplication and Division can use same hardware!2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.5Review: Booth’s Algorithm⇒Alternate representation Current Bit Bit to the Right Explanation Example Op1 0 Begins run of 1s 0001111000 sub (1)1 1 Middle of run of 1s 0001111000 none (0)0 1 End of run of 1s 0001111000 add (1)0 0 Middle of run of 0s 0001111000 none (0)Examples (8 bits):0 1 1 1 1 0beginning of runend of runmiddle of run18163211001001000101112321601000100000011101412411100000111111013−+−=⇒=−=⇒=−+−=⇒=−2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.6MIPS logical instructions° Instruction Example Meaning Comment° and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND° or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR° xor xor $1,$2,$3 $1 = $2 ⊕ $3 3 reg. operands; Logical XOR° nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR° and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant° or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant° xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant° shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant° shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant° shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) ° shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable° shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable° shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.7ShiftersTwo kinds:logical (RIGHT OR LEFT)-- value shifted in is always "0"arithmetic– (RIGHT ONLY), sign extendmsb lsb"0" "0"msb lsb "0"Note: these are single bit shifts. A given instruction might request 0 to 32 bits to be shifted!2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.8Barrel ShifterTechnology-dependent solutions: transistor per switchD3D2D1D0A6A5A4A3 A2 A1 A0SR0SR1SR2SR32/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.9The Big Picture: Where are We Now?° The Five Classic Components of a Computer° Today’s Topic: Design a Single Cycle ProcessorControlDatapathMemoryProcessorInputOutputinst. set design (L1-2)technology (L3)machinedesignArithmetic (L4-6)2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.10The Big Picture: The Performance Perspective° Performance of a machine is determined by:• Instruction count• Clock cycle time• Clock cycles per instruction° Processor design (datapath and control) will determine:• Clock cycle time• Clock cycles per instruction° Today:• Single cycle processor:- Advantage: One clock cycle per instruction- Disadvantage: long cycle timeCPIInst. CountCycle Time2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.11How to Design a Processor: step-by-step° 1. Analyze instruction set => datapath requirements• the meaning of each instruction is given by the register transfers• datapath must include storage element for ISA registers- possibly more• datapath must support each register transfer° 2. Select set of datapath components and establish clocking methodology° 3. Assembledatapath meeting the requirements° 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.° 5. Assemble the control logic2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.12The MIPS Instruction Formats° All MIPS instructions are 32 bits long. The three instruction formats:• R-type• I-type• J-type° The different fields are:• op: operation of the instruction• rs, rt, rd: the source and destination register specifiers• shamt: shift amount• funct: selects the variant of the operation in the “op” field• address / immediate: address offset or immediate value• target address: target address of the jump instruction op target address026316 bits 26 bitsop rs rt rd shamt funct0611162126316 bits 6 bits5 bits5 bits5 bits5 bitsop rs rtimmediate0162126316 bits 16 bits5 bits5 bits2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.13Step 1a: The MIPS-lite Subset for today° ADD and SUB• addU rd, rs, rt• subU rd, rs, rt° OR Immediate:• ori rt, rs, imm16° LOAD and STORE Word• lw rt, rs, imm16• sw rt, rs, imm16° BRANCH:• beq rs, rt, imm16op rs rt rd shamt funct0611162126316 bits 6 bits5 bits5 bits5 bits5 bitsop rs rt immediate0162126316 bits 16 bits5 bits5 bitsop rs rt immediate0162126316 bits 16 bits5 bits5 bitsop rs rt immediate0162126316 bits 16 bits5 bits5 bits2/19/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec7.14Logical Register Transfers° RTL gives the meaningof the instructions° All start by fetching the instructionop | rs | rt | rd | shamt | funct = MEM[ PC ]op | rs | rt | Imm16 = MEM[ PC ]inst Register TransfersADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4SUBU R[rd] <– R[rs] – R[rt]; PC <– PC + 4ORi
View Full Document