Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 21 Advanced Processors II T T h haan 2004 11 16 nkkss K Krrsst tto o t e e A Dave Patterson Assaan no ovviic c www cs berkeley edu patterson John Lazzaro www cs berkeley edu lazzaro www inst eecs berkeley edu cs152 CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Last Time Superpipelining Superscalar Q Could adding pipeline stages reduce CPI for an application A Yes due to these problems ARM XScale 8 stages CS 152 L21 Advanced Processors II CPI Problem Possible Solution Extra branch delays Branch prediction Extra load delays Optimize code Structural hazards Optimize code add hardware UC Regents Fall 2004 UCB Today Dynamic Scheduling Overview Goal Enable out of order by breaking pipeline in two fetch and execution Example IBM Power 5 I fetch I fetch and and decode decode like like static static CS 152 L21 Advanced Processors II Today s Today s focus focus execution execution UC Regents Fall 2004 UCB Dynamic Scheduling A mix of 3 ideas Top down idea Registers that may be written only once but may be read many times eliminate WAW and WAR hazards Mid level idea An instruction waiting for an operand to execute may trigger on the single write to the associated register Bottom up idea To support snooping on register writes attach all machine elements to a common bus Robert Tomasulo IBM 1967 FP unit for IBM 360 91 CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB A common bus long wires slow Pipelines in theory Pipelines Long wires are Long wires are in practice the the price price we we paid paid to to avoid avoid stalls stalls Conjecture Conjecture If If processor processor speed speed is is limited Wires are short limited by by so clock periods long long wires wires can be short lets lets do do a a wiring design wiring design by that by that fully fully abutment uses abutment uses the the CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB A bus based multi cycle computer From Memory Load Unit If If we we add add too too many many functional functional units units one one bus bus is is too too long long too too slow slow Solutions Solutions more more buses buses Register faster faster File electrical electrical signalling signalling ALU 1 ALU 2 Common Common Data Data Store Bus Busone 1 1 Only Only one unit unit writes writes at at a a time time Unit one one source source To Memory 2 2 All All units units may may read read the the written written values values CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Administrivia Final project begins Thursday 11 18 Preliminary design document due by 9 PM Friday 11 19 Review design document with TAs in lab section Sunday 11 21 Revised design document due in email by 11 59 PM Friday 12 3 Demo deep pipelining to TAs in lab section CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Administrivia Mid term and Field Trip Mid Term II Review Session Sunday 11 21 7 9 PM 306 Soda Mid Term II Tuesday 11 23 5 30 to 8 30 PM 101 Morgan LaVal s 9 PM Thanksgiving Thanksgiving Xilinx field trip Tuesday 11 30 bus Holidays leaves Holidays at 8 30 AM from 4th floor Soda Thursday 12 2 options Advice onon Presentations Send Doug bus Send Doug RSVP RSVP options on bus Prepare for your final project talk driving not going driving notyou going CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Register Renaming CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Consider this simple loop F4 0 R1 Every Every pass pass through through the the loop loop introduces introduces the the potential potential for for WAW WAW and or and or WAR WAR hazards hazards for for F0 F0 F4 F4 and and R1 R1 CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Given an endless supply of registers Rename architected registers Ri Fi to new physical registers PRi PFi on each write ADDI R1 R0 64 R1 PR01 F0 PF00 F4 0 R1 ADDI ADDI PR01 PR00 64 PR01 PR00 64 LD LD PF00 PF00 0 PR01 0 PR01 ADDD ADDD PF04 PF04 PF00 PF00 PF02 PF02 SD SD PF04 PF04 0 PR01 0 PR01 SUBI SUBI BEQZ BEQZ An An instruction instruction may may execute execute once once all all of of its its source source registers registers have have been been written written CS 152 L21 Advanced Processors II PR11 PR11 PR01 PR01 88 PR11 PR11 ENDLOOP ENDLOOP ITER2 ITER2 LD LD PF10 PF10 0 PR11 0 PR11 ADDD ADDD PF14 PF14 PF10 PF10 PF02 PF02 SD SD PF14 PF14 0 PR11 0 PR11 SUBI SUBI BEQZ BEQZ PR21 PR21 PR11 PR11 88 PR21 PR21 ENDLOOP ENDLOOP ITER3 ITER3 LD LD PF20 PF20 O PR21 O PR21 UC Regents Fall 2004 UCB Data Driven Execution Associative Control Caveat Caveat In In comparison comparison to to static static pipelines pipelines there there is is great great diversity diversity in in dynamic dynamic scheduling scheduling implementations implementations Presentation Presentation that that follows follows is is a a composite composite and and CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Recall IBM Power 5 block diagram Interface Interface between between instruction instruction fetch fetch and and execution execution ISS MP ISS MP Mapping Mapping from from architected Instruction architected registers registers to to Instruction Issue physical Issue physical registers registers renaming renaming CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Instructions placed in Reorder Buffer In s t 6 7 s rc 1 s rc 1 v a l s rc 2 s rc 2 v a l d e s t Each line holds physical src1 src2 dest registers for an instruction and controls when it executes dest val Reorde r Buffer From Memory Load Unit ALU 1 ALU 2 Store Unit To Memory Common Common Data Data Bus Bus dest dest dest dest val val Execution Execution engine engine works works on on the the physical physical CS 152 L21 Advanced Processors II UC Regents Fall 2004 UCB Circular Reorder Buffer A closer look il Ta Next Next instr instr to to commit commit Inst O U E p0 0 complete complete 0 0 1 1 Instruction Instruction opcode opcode Use Use bit bit 1 1 if if line line is is in in Execute bit use Execute bit 0 0 if if use waiting waiting P2 P P1 1 2 d of 1 Pd P2 Pd value valu value e Lis t ad e H f o t Lis 8 9 10 1 1 1 1 0 0 0 0 Add Add next next inst inst Physical Valid PhysicalValid in in program program register bits register bits order order numbers for numbers for values CS 152 L21 Advanced Processors II Copies Copies of of physical physical register …


View Full Document

Berkeley COMPSCI 152 - Lecture 21 – Advanced Processors II

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 21 – Advanced Processors II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 21 – Advanced Processors II and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?