Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005 10 27 John Lazzaro www cs berkeley edu lazzaro TAs David Marquardt and Udam Saini www inst eecs berkeley edu cs152 CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Last Time Error Correcting Codes We write Later we read D D D P D P P 0 1 1 0 01 1 D D D P D P P 0 1 0 0 01 1 Cosmic ray hit D1 But how do we know that On readout we compute P xor D xor D xor D 1 P xor D xor D xor D 1 P xor D xor D xor D 0 Note we number the least significant bit with 1 not 0 0 is reserved for no errors CS 152 L17 Advanced Processors I 0 xor 0 xor 0 xor 0 xor 1 xor 1 xor 7 654 3 2 1 D D D P D P P 0 1 0 0 01 1 0 1 xor 0 0 xor 0 1 xor P P P b101 5 What does 5 mean The position of the flipped bit To repair just flip it back UC Regents Fall 2005 UCB Today Beyond the 5 stage pipeline Taxonomy Introduction to advanced processor techniques Superpipelining Increasing the number of pipeline stages Superscalar Issuing several instructions in a single cycle CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB 5 Stage Pipeline A point of departure Graphically Representing MIPS Pipeline Seconds Instructions Cycles Seconds Program Program Instruction Cycle ALU Fi l l i n g all t c e IM DM Reg de l a Reg f r y s l ot Pe h i n g s b c r a n c h l ca o ad At best the 5 stage pipeline executes one instruction per with a clock period Can helpclock with answering questions like determined by the slowest stage how many cycles does it take to execute this code what is the ALUdoes doingnot during cycle 4 Application need multi cycle instructions multiply divide etc is there a hazard why does it occur and how can it be fi CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Superpipelining Add more stages URNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 Seconds Program Instructions Program Cycles Instruction y a d To Seconds Cycle Goal Reduce critical path by adding more pipeline stages Example 8 stage ARM XScale extra IF ID data cache stages Difficulties Added penalties for load delays and branch misses Ultimate Limiter As logic delay Also power goes to 0 FF clk to Q and setup Microprocessor pipeline organization in Fig 2 whereCSthe are indicated by 152state L17 boundaries Advanced Processors I UC Regents Fall 2005 UCB y a d Superscalar Multiple issues per cycle To Example CPU with floating point ALUs issue 1 FP 1 integer instruction per cycle Difficulties Load and branch delays affect more instructions 8 B 8 B 789 9 9 Goal Improve CPI by issuing several instructions per cycle I7 IJ 7N8A 7 DD M4 N 8A BA 789 9 9 I 8 OPQR 7PQR 7D KL 012 3 4 556 012 3 4 550 B 9D A D E 9 9 9 8 9 9 F89 9 89 9B8 F89 9 89 G 1C1 C F A H 9 9C D8 9C A 9C A 9 B 9 789 9 89 9 9 A B B A 9 C UC Regents Fall 2005 UCB Ultimate Limiter Programs may be a poor match to issue rules CS 152 L17 Advanced Processors I Seconds Cycle Cycles Instruction Instructions Program Seconds Program Out of Order Going around stalls ue sday T Seconds Program Instructions Program Cycles Seconds Cycle Instruction 0 123 4 5 664 7869 7 5 8 Goal Issue instructions out of program order Example so let ADDD go first 0 ADDD 1 2 3 45 3 MULTD waiting F4 to on load Difficulties Bookkeeping is highly complex 6 78 0 A poor fit for lockstep instruction scheduling 786 A 2 2 2 2 2 2 4 0 0 4of 2 2 2 1 1 Ultimate Limiter The amount instruction 49 AB A 9 C D 9 A9 EB A 9 level parallelism present in an application F G H A9I A CJ BK CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Dynamic Scheduling End lockstep ue sday T Goal Enable out of order by breaking pipeline in two fetch and execution Example IBM Power 5 Out of order processing Branch redirects Instruction fetch IF IC BP D0 D1 D2 D3 Xfer Group formation and instruction decode GD MP ISS RF EX MP ISS RF EA MP ISS RF EX MP ISS RF Branch pipeline Load store pipeline DC Fixed point pipeline F6 Interrupts and flushes Fmt Floatingpoint pipeline WB Xfer WB Xfer WB Xfer WB Xfer Limiters Design complexity instruction level parallelism CP Figure 3 Power5 instruction pipeline IF instruction fetch IC instruction cache BP branch predict D0 decode stage 0 Xfer transfer GD group dispatch MP mapping ISS instruction issue RF register file read EX execute EA compute address DC data caches F6 six cycle floating point execution pipe Fmt data format WB write back and CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB CP group commit t x e Throughput and multiple threads N y a d s r u h T Goal Use multiple CPUs real and virtual to improve 1 throughput of machines that run many programs 2 execution time of multithreaded programs Example Sun Niagara 8 SPARCs on one chip Difficulties Gaining full advantage requires rewriting applications OS libraries Ultimate limiter Amdahl s law memory system performance CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Friday DRAM controller checkoff Run your test vector suite on the Calinx board display results on LEDs Write different addresses values Test ongoing transactions on both buses with randomized start times Load store and verify different data word patterns CS 152 L17 Advanced Processors I IM Bus T e s t V e c t o r s DM Bus D R A M C o n t r o l l e r DRAM UC Regents Fall 2005 UCB Superpipelining CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Add pipeline stages reduce clock period NAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 Seconds Program Instructions Program Cycles Instruction Seconds Cycle Q Could adding pipeline stages hurt the CPI for an application A Yes due to these problems ARM XScale 8 stages Fig 2 where the state boundaries are indicated by croprocessor pipeline organization CS 152 L17 Advanced Processors I CPI Problem Possible Solution Taken branches cause longer stalls Branch prediction loop unrolling Cache misses take more clock cycles Larger caches add prefetch opcodes to ISA UC Regents Fall 2005 UCB Recall Control hazards IF Fetch ID Decode EX ALU IR 0x4 IR MEM IR WB IR I Cache Instr Mem PC D Q Addr Data We avoiding stalling by 1 adding a branch …


View Full Document

Berkeley COMPSCI 152 - Advanced Processors I

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Advanced Processors I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Advanced Processors I and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?