Spring 2010EECS150 - Lec10-cpuPage EECS150 - Digital DesignLecture 10- CPU MicroarchitectureFeb 18, 2010John Wawrzynek1Spring 2010EECS150 - Lec10-cpuPage Processor Microarchitecture IntroductionMicroarchitecture: how to implement an architecture in hardwareGood examples of how to put principles of digital design to practice.Introduction to final project.2Spring 2010EECS150 - Lec10-cpuPage MIPS Processor Architecture• For now we consider a subset of MIPS instructions:–R-type instructions: and, or, add, sub, slt–Memory instructions: lw, sw–Branch instructions: beq• Later we’ll add addi and j3Spring 2010EECS150 - Lec10-cpuPage MIPS Micrarchitecture Oganization4Datapath + Controller + External MemoryControllerSpring 2010EECS150 - Lec10-cpuPage How to Design a Processor: step-by-step1. Analyze instruction set architecture (ISA) ⇒ datapath requirements– meaning of each instruction is given by the data transfers (register transfers)– datapath must include storage element for ISA registers– datapath must support each data transfer2. Select set of datapath components and establish clocking methodology3. Assemble datapath meeting requirements4. Analyze implementation of each instruction to determine setting of control points that effects the data transfer.5. Assemble the control logic.5Spring 2010EECS150 - Lec10-cpuPage Review: The MIPS Instruction R-typeI-typeJ-typeThe different fields are:op: operation (“opcode”) of the instructionrs, rt, rd: the source and destination register specifiersshamt: shift amountfunct: selects the variant of the operation in the “op” fieldaddress / immediate: address offset or immediate valuetarget address: target address of jump instruction op target address026316 bits 26 bitsop rs rt rd shamt funct0611162126316 bits 6 bits5 bits5 bits5 bits5 bitsop rs rtaddress/immediate0162126316 bits 16 bits5 bits5 bits6Spring 2010EECS150 - Lec10-cpuPage Subset for Lectureadd, sub, or, slt•addu rd,rs,rt•subu rd,rs,rtlw, sw•lw rt,rs,imm16•sw rt,rs,imm16beq•beq rs,rt,imm16 op rs rt rd shamt funct0611162126316 bits 6 bits5 bits5 bits5 bits5 bitsop rs rt immediate0162126316 bits 16 bits5 bits5 bitsop rs rt immediate0162126316 bits 16 bits5 bits5 bits7Spring 2010EECS150 - Lec10-cpuPage Register Transfer DescriptionsAll start with instruction fetch:{op , rs , rt , rd , shamt , funct} ← IMEM[ PC ] OR{op , rs , rt , Imm16} ← IMEM[ PC ] THENinst Register Transfersadd! R[rd] ← R[rs] + R[rt];! ! ! PC ← PC + 4sub! R[rd] ← R[rs] – R[rt];! ! PC ← PC + 4or R[rd] ← R[rs] | R[rt]; PC ← PC + 4slt! R[rd] ← (R[rs] < R[rt]) ? 1 : 0; ! PC ← PC + 4lw! R[rt] ← DMEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4sw! DMEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4beq if ( R[rs] == R[rt] ) then PC ← PC + 4 + {sign_ext(Imm16), 00} else PC ← PC + 48Spring 2010EECS150 - Lec10-cpuPage MicroarchitectureMultiple implementations for a single architecture:– Single-cycle• Each instruction executes in a single clock cycle.– Multicycle• Each instruction is broken up into a series of shorter steps with one step per clock cycle.– Pipelined• Each instruction is broken up into a series of steps with one step per clock cycle• Multiple instructions execute at once.9Spring 2010EECS150 - Lec10-cpuPage CPU clocking (1/2)• Single Cycle CPU: All stages of an instruction are completed within one long clock cycle. – The clock cycle is made sufficient long to allow each instruction to complete all stages without interruption and within one cycle.1. InstructionFetch2. Decode/ RegisterRead3. Execute 4. Memory5. Reg. Write10Spring 2010EECS150 - Lec10-cpuPage CPU clocking (2/2)• Multiple-cycle CPU: Only one stage of instruction per clock cycle. – The clock is made as long as the slowest stage.Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped).1. InstructionFetch2. Decode/ RegisterRead3. Execute 4. Memory5. Reg. Write11Spring 2010EECS150 - Lec10-cpuPage MIPS State Elements12• Determines everything about the execution status of a processor:–PC register– 32 registers– MemoryNote: for these state elements, clock is used for write but not for read (asynchronous read, synchronous write).Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: lw fetch•First consider executing lw•STEP 1: Fetch instruction13R[rt] ← DMEM[ R[rs] + sign_ext(Imm16)]Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: lw register read•STEP 2: Read source operands from register file14R[rt] ← DMEM[ R[rs] + sign_ext(Imm16)]Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: lw immediate•STEP 3: Sign-extend the immediate15R[rt] ← DMEM[ R[rs] + sign_ext(Imm16)]Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: lw address•STEP 4: Compute the memory address16R[rt] ← DMEM[ R[rs] + sign_ext(Imm16)]Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: lw memory read•STEP 5: Read data from memory and write it back to register file17R[rt] ← DMEM[ R[rs] + sign_ext(Imm16)]Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: lw PC increment•STEP 6: Determine the address of the next instruction18PC ← PC + 4Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: sw•Write data in rt to memory19DMEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: R-type instructions•Read from rs and rt• Write ALUResult to register file•Write to rd (instead of rt)20R[rd] ← R[rs] op R[rt]Spring 2010EECS150 - Lec10-cpuPage Single-Cycle Datapath: beq•Determine whether values in rs and rt are equal• Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4)21if ( R[rs] == R[rt] ) then PC ← PC + 4 + {sign_ext(Imm16), 00}Spring 2010EECS150 - Lec10-cpuPage Complete Single-Cycle Processor22Spring 2010EECS150 - Lec10-cpuPage Control Unit23Spring 2010EECS150 - Lec10-cpuPage Review: ALUF2:0Function0A & B1A | B10A + B11not used100A & ~B101A | ~B110A - B111SLT24Spring 2010EECS150 - Lec10-cpuPage Control Unit: ALU DecoderALUOp1:0Meaning0Add1Subtract10Look at Funct11Not UsedALUOp1:0FunctALUControl2:00X010 (Add)X1X110 (Subtract)1X100000 (add)010 (Add)1X100010 (sub)110 (Subtract)1X100100 (and)000 (And)1X100101
View Full Document