CS152 Computer Architecture and Engineering Lecture 3 Performance Technology Delay Modeling Jan 27 1999 John Kubiatowicz http cs berkeley edu kubitron lecture slides http www inst eecs berkeley edu cs152 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Outline of Today s Lecture Review Finish ISA MIPS details 10 minutes Performance and Technology 15 minutes Administrative Matters and Questions 2 minutes Delay Modeling and Gate Characterization 20 minutes Questions and Break 5 minutes Clocking Methodologies and Timing Considerations 25 minutes 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Summary Instruction set design MIPS Use general purpose registers with a load store architecture YES Provide at least 16 general purpose registers plus separate floatingpoint registers 31 GPR 32 FPR Support basic addressing modes displacement with address offset of 12 to 16 bits immediate size 8 to 16 bits and register deferred YES 16 bit immediate displacement disp 0 register deferred All addressing modes apply to all data transfer instructions YES Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size Fixed Support these data sizes and types 8 bit 16 bit 32 bit integers and 32 bit and 64 bit IEEE 754 floating point numbers YES Support most common instructions since they will dominate load store add subtract move register register and shift compare equal compare not equal branch with a PC relative address at least 8 bits long jump call and return YES 16b relative address Aim for a minimalist instruction set YES 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Summary Salient features of MIPS I 32 bit fixed format inst 3 formats 32 32 bit GPR R0 contains zero and 32 FP registers HI LO partitioned by software convention 3 address reg reg arithmetic instr Single address mode for load store base displacement no indirection scaled 16 bit immediate plus LUI Simple branch conditions compare against zero or two registers for no integer condition codes Support for 8bit 16bit and 32bit integers Support for 32bit and 64bit floating point 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Details of the MIPS instruction set Register zero always has the value zero even if you try to write it Branch jump and link put the return addr PC 4 into the link register R31 All instructions change all 32 bits of the destination register including lui lb lh and all read all 32 bits of sources add sub and or Immediate arithmetic and logical instructions are extended as follows logical immediates ops are zero extended to 32 bits arithmetic immediates ops are sign extended to 32 bits including addu The data loaded by the instructions lb and lh are extended as follows lbu lhu are zero extended lb lh are sign extended Overflow can occur in these arithmetic and logical instructions add sub addi it cannot occur in addu subu addiu and or xor nor shifts mult multu div divu 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Calls Why Are Stacks So Great Stacking of Subroutine Calls Returns and Environments A A CALL B A B B CALL C C A B C RET A B RET A Some machines provide a memory stack as part of the architecture e g VAX Sometimes stacks are implemented via software convention e g MIPS 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Memory Stacks Useful for stacked environments subroutine call return even if operand stack not part of architecture Stacks that Grow Up vs Stacks that Grow Down Next Empty SP Last Full c b a inf Big 0 Little grows up grows down 0 Little inf Big Memory Addresses How is empty stack represented Consider case of stack growing down MIPS Last Empty Last Full POP Read from Mem SP Increment SP PUSH Decrement SP Write to Mem SP 1 27 99 UCB Spring 1999 POP Increment SP Read from Mem SP PUSH Write to Mem SP Decrement SP CS152 Kubiatowicz Call Return Linkage Stack Frames ARGS Callee Save Registers High Mem Reference args and local variables at fixed positive offset from FP old FP RA Local Variables FP Grows and shrinks during expression evaluation SP Low Mem Many variations on stacks possible up down last pushed next Compilers normally keep scalar variables in registers not memory 1 27 99 UCB Spring 1999 CS152 Kubiatowicz MIPS Software conventions for Registers 0 zero constant 0 16 s0 callee saves 1 at callee must save 2 v0 expression evaluation 23 s7 3 v1 function results 24 t8 4 a0 arguments 25 t9 5 a1 26 k0 reserved for OS kernel 6 a2 27 k1 7 a3 28 gp Pointer to global area 8 t0 15 t7 1 27 99 reserved for assembler temporary cont d temporary caller saves 29 sp Stack pointer callee can clobber 30 fp frame pointer 31 ra Return Address HW UCB Spring 1999 CS152 Kubiatowicz MIPS GCC Calling Conventions fact addiu sp sp 32 FP SP ra sw ra 20 sp sw fp 16 sp FP addiu fp sp 32 SP ra sw a0 0 fp low address ra old FP lw 31 20 sp FP lw fp 16 sp SP addiu sp sp 32 jr ra old FP 31 First four arguments passed in registers 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Delayed Branches li r3 7 sub r4 r4 1 bz r4 LL addi r5 r3 1 subi r6 r6 2 LL slt r1 r3 r5 In the Raw MIPS the instruction after the branch is executed even when the branch is taken This is hidden by the assembler for the MIPS virtual machine allows the compiler to better utilize the instruction pipeline 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Branch Pipelines Time li r3 7 execute sub r4 r4 1 bz r4 LL ifetch execute ifetch execute addi r5 r3 1 LL slt r1 r3 r5 ifetch Branch execute Branch Target ifetch Delay Slot execute By the end of Branch instruction the CPU knows whether or not the branch will take place However it will have fetched the next instruction by then regardless of whether or not a branch will be taken Why not execute it Is this a violation of the ISA abstraction 1 27 99 UCB Spring 1999 CS152 Kubiatowicz Performance Purchasing perspective given a collection of machines which has the best performance least cost best performance cost Design perspective faced with design options which has the best performance improvement least cost best performance cost Both require basis for comparison metric for evaluation Our goal is to understand cost performance implications of architectural choices UCB Spring 1999 CS152 Kubiatowicz 1 27 99 Two notions of performance Plane DC to Paris Speed Passengers Throughput pmph Boeing 747 6 5 hours 610 mph 470 286 700 BAD Sud Concorde 3 hours 1350 mph 132 178 200 Which has higher performance Time to do the task Execution Time execution time response time latency Tasks per day hour week sec ns
View Full Document
Unlocking...