Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 3 Performance Technology Delay Modeling September 5 2001 John Kubiatowicz http cs berkeley edu kubitron lecture slides http www inst eecs berkeley edu cs152 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Review Salient features of MIPS I 32 bit fixed format inst 3 formats 32 32 bit GPR R0 contains zero and 32 FP registers HI LO partitioned by software convention 3 address reg reg arithmetic instr Single address mode for load store base displacement no indirection scaled 16 bit immediate plus LUI Simple branch conditions compare against zero or two registers for no integer condition codes Support for 8bit 16bit and 32bit integers Support for 32bit and 64bit floating point 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Review MIPS Addressing Modes Instruction Formats All instructions 32 bits wide Register direct op rs rt rd register Immediate Base index op rs rt immed op rs rt immed register PC relative op rs rt immed Memory PC 9 5 01 Memory UCB Fall 2001 CS152 Kubiatowicz Review When does MIPS sign extend When value is sign extended copy upper bit to full value Examples of sign extending 8 bits to 16 bits 00001010 00000000 00001010 10001100 11111111 10001100 When is an immediate value sign extended Arithmetic instructions add sub etc sign extend immediates even for the unsigned versions of the instructions Logical instructions do not sign extend addi r2 r3 1 has 0xFFFF in immediate field and will extend to 0xFFFFFFFF before adding andi r2 r3 1 has 0xFFFF in immediate field and will extend to 0x0000FFFF before anding Kinda weird to put negative numbers in logical instructions 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Review Details of the MIPS instruction set zero always has the value zero even if you try to write it Register Branch jump and link put the return addr PC 4 into the link register R31 also called ra All instructions change all 32 bits of the destination register including lui lb lh and all read all 32 bits of sources add and The difference between signed and unsigned versions For add and subtract signed causes exception on overflow No difference in sign extension behavior For multiply and divide distinguishes type of operation Thus overflow can occur in these arithmetic and logical instructions add sub addi it cannot occur in addu subu addiu and or xor nor shifts mult multu div divu Immediate arithmetic and logical instructions are extended as follows logical immediates ops are zero extended to 32 bits arithmetic immediates ops are sign extended to 32 bits including addu The data loaded by the instructions lb and lh are extended as follows lbu 9 5 01 lhu are zero extended lb lh are sign extended UCB Fall 2001 CS152 Kubiatowicz Calls Why Are Stacks So Great Stacking of Subroutine Calls Returns and Environments A A CALL B A B B CALL C C A B C RET A B RET A Some machines provide a memory stack as part of the architecture e g VAX Sometimes stacks are implemented via software convention e g MIPS 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Memory Stacks Useful for stacked environments subroutine call return even if operand stack not part of architecture Stacks that Grow Up vs Stacks that Grow Down Next Empty SP Last Full c b a inf Big 0 Little grows up grows down 0 Little inf Big Memory Addresses How is empty stack represented Big Little Last Full Big Little Next Empty POP Read from Mem SP Increment SP POP Increment SP Read from Mem SP PUSH Decrement SP Write to Mem SP PUSH Write to Mem SP Decrement SP 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Call Return Linkage Stack Frames ARGS Callee Save Registers High Mem Reference args and local variables at fixed positive offset from FP old FP RA Local Variables FP Grows and shrinks during expression evaluation SP Low Mem Many variations on stacks possible up down last pushed next Compilers normally keep scalar variables in registers not memory 9 5 01 UCB Fall 2001 CS152 Kubiatowicz MIPS Software conventions for Registers 0 zero constant 0 16 s0 callee saves 1 at callee must save 2 v0 expression evaluation 23 s7 3 v1 function results 24 t8 4 a0 arguments 25 t9 5 a1 26 k0 reserved for OS kernel 6 a2 27 k1 7 a3 28 gp Pointer to global area 8 t0 15 t7 9 5 01 reserved for assembler temporary cont d temporary caller saves 29 sp Stack pointer callee can clobber 30 fp frame pointer 31 ra Return Address HW UCB Fall 2001 CS152 Kubiatowicz MIPS GCC Calling Conventions FP fact addiu SP ra sp sp 32 sw ra 20 sp sw fp 16 sp FP addiu fp sp 32 SP ra sw a0 0 fp low address ra old FP lw ra 20 sp FP lw fp 16 sp SP addiu sp sp 32 jr ra old FP ra First four arguments passed in registers Result passed in v0 v1 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Delayed Branches li r3 7 sub r4 r4 1 bz r4 LL addi r5 r3 1 subi r6 r6 2 LL slt r1 r3 r5 In the Raw MIPS the instruction after the branch is executed even when the branch is taken This is hidden by the assembler for the MIPS virtual machine allows the compiler to better utilize the instruction pipeline 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Branch Pipelines Time li r3 7 execute sub r4 r4 1 bz r4 LL ifetch execute ifetch addi r5 r3 1 LL slt r1 r3 r5 execute ifetch Branch execute Branch Target ifetch Delay Slot execute By the end of Branch instruction the CPU knows whether or not the branch will take place However it will have fetched the next instruction by then regardless of whether or not a branch will be taken Why not execute it Is this a violation of the ISA abstraction 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Performance Purchasing perspective given a collection of machines which has the best performance least cost best performance cost Design perspective faced with design options which has the best performance improvement least cost best performance cost Both require basis for comparison metric for evaluation 9 5 01 Our goal is to understand cost performance implications of architectural choices UCB Fall 2001 CS152 Kubiatowicz Two notions of performance Plane DC to Paris Speed Passengers Throughput pmph Boeing 747 6 5 hours 610 mph 470 286 700 BAD Sud Concorde 3 hours 1350 mph 132 178 200 Which has higher performance Time to do the task Execution Time execution time response time latency Tasks per day hour week sec ns Performance throughput bandwidth Response time and throughput often are in opposition 9 5 01 UCB Fall 2001 CS152 Kubiatowicz Definition s Performance is in units of things per second bigger is better If we are primarily concerned with response time performance x 1


View Full Document

Berkeley COMPSCI 152 - Lecture 3 Performance, Technology & Delay Modeling

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 3 Performance, Technology & Delay Modeling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 3 Performance, Technology & Delay Modeling and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?