CS152 Computer Architecture and Engineering Lecture 3 Performance, Technology & Delay ModelingOutline of Today’s LectureSummary: Instruction set design (MIPS)Summary: Salient features of MIPS IDetails of the MIPS instruction setCalls: Why Are Stacks So Great?Memory StacksCall-Return Linkage: Stack FramesMIPS: Software conventions for RegistersMIPS / GCC Calling ConventionsDelayed BranchesBranch & PipelinesPerformanceTwo notions of “performance”DefinitionsExampleBasis of EvaluationSPEC95Metrics of performanceAspects of CPU PerformanceSlide 21CPIAmdahl's LawExample (RISC processor)Evaluating Instruction Sets?Administrative MattersPerformance and Technology TrendsRange of Design StylesBasic Technology: CMOSBasic Components: CMOS InverterBasic Components: CMOS Logic GatesGate ComparisonIdeal versus RealityFluid Timing ModelSeries ConnectionReview: Calculating DelaysReview: General C/L Cell Delay ModelCharacterize a GateA Specific Example: 2 to 1 MUX2 to 1 MUX: Internal Delay Calculation2 to 1 MUX: Internal Delay Calculation (continue)Abstraction: 2 to 1 MUXBreak (5 Minutes)CS152 Logic ElementsStorage Element’s Timing ModelClocking MethodologyCritical Path & Cycle TimeClock Skew’s Effect on Cycle TimeTricks to Reduce Cycle TimeHow to Avoid Hold Time Violation?Clock Skew’s Effect on Hold TimeSummaryTo Get More Information1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.1CS152Computer Architecture and EngineeringLecture 3Performance, Technology & Delay ModelingJan 27, 1999John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.2Outline of Today’s Lecture °Review : Finish ISA/MIPS details (10 minutes)°Performance and Technology (15 minutes)°Administrative Matters and Questions (2 minutes)°Delay Modeling and Gate Characterization (20 minutes)°Questions and Break (5 minutes)°Clocking Methodologies and Timing Considerations (25 minutes)1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.3Summary: Instruction set design (MIPS)°Use general purpose registers with a load-store architecture: YES°Provide at least 16 general purpose registers plus separate floating-point registers: 31 GPR & 32 FPR°Support basic addressing modes: displacement (with address offset of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred: YES: 16 bit immediate, displacement (disp=0 register deferred)° All addressing modes apply to all data transfer instructions : YES °Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size : Fixed °Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers: YES°Support most common instructions, since they will dominate:load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, and return: YES, 16b relative address° Aim for a minimalist instruction set: YES1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.4Summary: Salient features of MIPS I• 32-bit fixed format inst (3 formats)• 32 32-bit GPR (R0 contains zero) and 32 FP registers (+ HI LO)– partitioned by software convention• 3-address, reg-reg arithmetic instr.• Single address mode for load/store: base+displacement– no indirection, scaled• 16-bit immediate plus LUI• Simple branch conditions– compare against zero or two registers for =,– no integer condition codes• Support for 8bit, 16bit, and 32bit integers• Support for 32bit and 64bit floating point.1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.5Details of the MIPS instruction set°Register zero always has the value zero (even if you try to write it)°Branch/jump and link put the return addr. PC+4 into the link register (R31)°All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …)°Immediate arithmetic and logical instructions are extended as follows:•logical immediates ops are zero extended to 32 bits•arithmetic immediates ops are sign extended to 32 bits (including addu)°The data loaded by the instructions lb and lh are extended as follows:•lbu, lhu are zero extended•lb, lh are sign extended°Overflow can occur in these arithmetic and logical instructions:•add, sub, addi•it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.6Calls: Why Are Stacks So Great?Stacking of Subroutine Calls & Returns and Environments:A: CALL B CALL C C: RET RETB: AA BA B CA BASome machines provide a memory stack as part of the architecture (e.g., VAX)Sometimes stacks are implemented via software convention (e.g., MIPS)1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.7Memory StacksUseful for stacked environments/subroutine call & return even if operand stack not part of architectureStacks that Grow Up vs. Stacks that Grow Down:abc0 Littleinf. Big0 Littleinf. BigMemoryAddressesSPNextEmpty?LastFull?How is empty stack represented?Consider case of stack growing down (MIPS)Last FullPOP: Read from Mem(SP) Increment SPPUSH: Decrement SP Write to Mem(SP)growsupgrowsdownLast EmptyPOP: Increment SP Read from Mem(SP)PUSH: Write to Mem(SP) Decrement SP1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.8Call-Return Linkage: Stack FramesFPARGSCallee SaveRegistersLocal VariablesSPReference args andlocal variables atfixed (positive) offsetfrom FPGrows and shrinks duringexpression evaluation(old FP, RA)°Many variations on stacks possible (up/down, last pushed / next )°Compilers normally keep scalar variables in registers, not memory!High MemLow Mem1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.90 zero constant 01 at reserved for assembler2 v0 expression evaluation &3 v1 function results4 a0 arguments5 a16 a27 a38 t0 temporary: caller saves. . . (callee can clobber)15 t7MIPS: Software conventions for Registers16 s0 callee saves. . . (callee must save)23 s724 t8 temporary (cont’d)25 t926 k0 reserved for OS kernel27 k128 gp Pointer to global area29 sp Stack pointer30 fp frame pointer31 ra Return Address (HW)1/27/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec3.10MIPS / GCC Calling ConventionsFPSPfact:addiu $sp, $sp, -32sw $ra, 20($sp)sw $fp,
View Full Document