CS 152 Computer Architecture and Engineering Lecture 9 Designing a Multicycle ProcessorRecap: Processor Design is a ProcessRecap: A Single Cycle DatapathRecap: The “Truth Table” for the Main ControlRecap: PLA Implementation of the Main ControlRecap: Systematic Generation of ControlThe Big Picture: Where are We Now?Abstract View of our single cycle processorWhat’s wrong with our CPI=1 processor?Memory Access TimeReducing Cycle TimeWorst Case Timing (Load)Basic Limits on Cycle TimePartitioning the CPI=1 DatapathExample Multicycle DatapathAdministrative IssuesRecall: Step-by-step Processor DesignStep 4: R-rtype (add, sub, . . .)Step 4: Logical immedStep 4 : LoadStep 4 : StoreStep 4 : BranchAlternative datapath (book): Multiple Cycle DatapathOur Control ModelStep 4 Control Specification for multicycle procTraditional FSM ControllerStep 5 (datapath + state diagram control)Mapping RTs to Control PointsAssigning States(Mostly) Detailed Control Specification (missing0)Performance EvaluationController DesignExample: Jump-CounterUsing a Jump CounterOur MicrosequencerMicroprogram Control SpecificationAdding the Dispatch ROMExample: Controlling MemoryController handles non-ideal memoryReally Simple Time-State ControlTime-state Control PathOverview of ControlSummarySummary (cont’d)Where to get more information?2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.1CS 152Computer Architecture and EngineeringLecture 9 Designing a Multicycle ProcessorFebruary 26, 2003John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.2Recap: Processor Design is a Process°Bottom-up•assemble components in target technology to establish critical timing°Top-down•specify component behavior from high-level requirements°Iterative refinement•establish partial solution, expand and improvedatapathcontrolprocessorInstruction SetArchitectureReg. File Mux ALU Reg Mem Decoder SequencerCells Gates2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.3Recap: A Single Cycle Datapath32ALUctrClkbusWRegWr3232busA32busB55 5Rw Ra Rb32 32-bitRegistersRsRtRtRdRegDstExtenderMuxMux3216imm16ALUSrcExtOpMuxMemtoRegClkData InWrEn32AdrDataMemory32MemWrALUInstructionFetch UnitClkEqualInstruction<31:0>010101<21:25><16:20><11:15><0:15>Imm16RdRtRsnPC_sel2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.4Recap: The “Truth Table” for the Main ControlR-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUop (Symbolic)1001000x“R-type”01010000Or01110001Addx1x01001Addx0x0010xSubtractxxx0001xxxxop 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010ALUop <2> 1 0 0 00xALUop <1> 0 1 0 00xALUop <0> 0 0 0 01xMainControlop6ALUControl(Local)func36ALUopALUctr3RegDstALUSrc:2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.5Recap: PLA Implementation of the Main Controlop<0>op<5>..op<5>..<0>op<5>..<0>op<5>..<0>op<5>..<0>op<5>..<0>R-type ori lw sw beq jumpRegWriteALUSrcMemtoRegMemWriteBranchJumpRegDstExtOpALUop<2>ALUop<1>ALUop<0>2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.6Recap: Systematic Generation of Control°In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction”•in general, the controller is a finite state machine•microinstruction can also control sequencing (see later)Control Logic / Store(PLA, ROM)OPcodeDatapathInstructionDecodeConditionsControlPointsmicroinstruction2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.7The Big Picture: Where are We Now? °The Five Classic Components of a Computer°Today’s Topic: Designing the Datapath for the Multiple Clock Cycle DatapathControlDatapathMemoryProcessorInputOutput2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.8Abstract View of our single cycle processor°looks like a FSM with PC as statePCNext PCRegisterFetchALUReg. WrtMemAccessDataMemInstructionFetchResult StoreALUctrRegDstALUSrcExtOpMemWrEqualnPC_selRegWrMemWrMemRdMainControlALUcontrolopfunExt2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.9What’s wrong with our CPI=1 processor?°Long Cycle Time°All instructions take as much time as the slowest°Real memory is not as nice as our idealized memory•cannot always get the job done in one (short) cyclePC Inst MemorymuxALU Data MemmuxPC Reg FileInst MemorymuxALUmuxPC Inst MemorymuxALU Data MemPC Inst Memory cmpmuxReg FileReg FileReg FileArithmetic & LogicalLoadStoreBranchCritical Pathsetupsetup2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.10Memory Access Time°Physics => fast memories are small (large memories are slow)•question: register file vs. memory°=> Use a hierarchy of memoriesStorage Arrayselected word lineaddressstorage cellbit linesense ampsaddressdecoderCacheProcessor1 time-periodproc. busL2Cachemem. bus2-3 time-periods20 - 50 time-periodsmemory2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.11Reducing Cycle Time°Cut combinational dependency graph and insert register / latch°Do same work in two fast cycles, rather than one slow one°May be able to short-circuit path and remove some components for some instructions!storage elementAcyclic CombinationalLogicstorage elementstorage elementAcyclic CombinationalLogic (A)storage elementstorage elementAcyclic CombinationalLogic (B)2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.12Worst Case Timing (Load)ClkPCRs, Rt, Rd,Op, FuncClk-to-QALUctrInstruction Memoey Access TimeOld Value New ValueRegWr Old Value New ValueDelay through Control LogicbusARegister File Access TimeOld Value New ValuebusBALU DelayOld Value New ValueOld Value New ValueNew ValueOld ValueExtOp Old Value New ValueALUSrc Old Value New ValueMemtoReg Old Value New ValueAddress Old Value New ValuebusW Old Value NewDelay through Extender & MuxRegisterWrite OccursData Memory Access Time2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.13Basic Limits on Cycle Time°Next address logic•PC <= branch ? PC + offset : PC + 4°Instruction Fetch•InstructionReg <= Mem[PC]°Register Access•A <= R[rs]°ALU operation•R <= A + BPCNext PCOperandFetchExecReg. FileMemAccessDataMemInstructionFetchResult StoreALUctrRegDstALUSrcExtOpMemWrnPC_selRegWrMemWrMemRdControl2/26/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec9.14Partitioning the CPI=1 Datapath°Add registers between smallest steps°Place enables on all registersPCNext PCOperandFetchExecReg. FileMemAccessDataMemInstructionFetchResult
View Full Document