Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005 10 27 John Lazzaro www cs berkeley edu lazzaro TAs David Marquardt and Udam Saini www inst eecs berkeley edu cs152 CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Last Time Error Correcting Codes D D D P D P P 0 1 1 0 01 1 We Cosmic ray write hit D1 But D D D P D P P Later we how do we 0 1 0 0 01 1 read know that On readout we compute P xor D xor D xor D 1 P1 xor D xor D xor D 1 0P xor D xor D xor D 0 1 Note we number the least significant bit with 1 not 0 0 is reserved for no errors CS 152 L17 Advanced Processors I 0 xor 0 xor 0 xor 0 xor 1 xor 1 xor 7 654 3 2 1 D D D P D P P 0 1 0 0 01 1 0 xor 0 xor 0 xor P P P b101 5 What does 5 mean The position of the flipped bit To repair just UC Regents Fall 2005 UCB Today Beyond the 5 stage pipeline Taxonomy Introduction to advanced processor techniques Superpipelining Increasing the number of pipeline stages Superscalar Issuing several instructions in a single cycle CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB 5 Stage Pipeline A point of departure Seconds Program Instructions Program Cycles Instruction Seconds Cycle F Fiilllliin t t n c g c g e a e a f l d l f l r d l e r e l e a l ay n P y n i Pe c h s i s l o l h o t b s t b c r s a a r a a n cc n c c h h llo a a d o d g At best the 5 stage pipeline g executes one instruction per clock with a clock period determined by the slowest stage Application Application does does not not need need multimulticycle cycle instructions instructions multiply multiply divide divide CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Superpipelining Add more stages TTooddaayy Goal Reduce critical path by adding more pipeline stages Example 8 stage ARM XScale extra IF ID data cache stages Difficulties Added penalties for load delays and branch misses Also CS 152 L17 Advanced Processors I Ultimate Limiter As logic delay goes to 0 FF clk to Q and setup UC Regents Fall 2005 UCB Superscalar Multiple issues per cycle TTooddaa yy Goal Improve CPI by issuing several instructions per cycle Example CPU with floating point ALUs issue 1 FP 1 integer instruction per cycle Difficulties Load and branch delays affect more instructions CS 152 L17 Advanced Processors I Ultimate Limiter Programs may be a poor match to issue rules UC Regents Fall 2005 UCB Out of Order Going around stalls yy a a d d s s e e u T Tu Goal Issue instructions out of program order MULTD Example MULTD waitin waitin so so g on g on let let F4 to ADDD F4 to ADDD ADDD load load go go Difficulties Bookkeeping is highly complex first first A poor fit for lockstep instruction scheduling Ultimate Limiter The amount of instruction level parallelism present in an application CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Dynamic Scheduling End lockstep Goal Enable out of order by breaking pipeline in two fetch and execution Example IBM Power 5 Limiters Design complexity instruction level parallelism CS 152 L17 Advanced Processors I yy a a d d s s e e u T Tu UC Regents Fall 2005 UCB t t x x e Throughput and multiple threads N e N rssd d r u hu T Th Goal Use multiple CPUs real and virtual to y y a a improve 1 throughput of machines that run many programs 2 execution time of multithreaded programs Example Sun Niagara 8 SPARCs on one chip Difficulties Gaining full advantage requires rewriting applications OS libraries Ultimate limiter Amdahl s law memory system performance CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Friday DRAM controller checkoff Run your test vector suite on the Calinx board display results on LEDs Write different addresses values Test ongoing transactions on both buses with randomized start times Load store and verify different data word patterns CS 152 L17 Advanced Processors I IM Bus T e s t V e c t o r s DM Bus D R A M C o n t r o l l e r DRAM UC Regents Fall 2005 UCB Superpipelining CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Add pipeline stages reduce clock period Q Could adding pipeline stages hurt the CPI for an application A Yes due to these problems ARM XScale 8 stages CS 152 L17 Advanced Processors I CPI Problem Possible Solution Taken branches cause longer stalls Branch prediction loop unrolling Cache misses take more clock cycles Larger caches add prefetch opcodes to ISA UC Regents Fall 2005 UCB Recall Control hazards IF Fetch ID Decode IR I Cache Sample Program ISA w o branch delay slot I1 BEQ R4 R3 25 I2 AND R6 R5 R4 I3 SUB R1 R9 R8 EX ALU IR MEM IR WB IR We avoiding stalling by 1 adding a branch delay slot and 2 adding comparator tostages ID stage If we add more early we must stall Time t1 t2 t3 t4 t5 t6 t7 t8 Inst EX stage IF ID EX MEM WB I1 compute I2 IF ID s if IF I3 branch I4 is taken If branch is taken I5 these instructions I6 MUST NOT CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Solution Branch prediction IF Fetch ID Decode IR I Cache Branch Predictor Predictions A control instr Taken or Not Taken The PC a branch targets EX ALU IR MEM IR WB IR We update the PC based on the outputs of the branch predictor If it is perfect pipe a stays full Dynamic Predictors cache of branch history Time t1 t2 t3 t4 t5 t6 t7 t8 Inst EX stage IF ID EX MEM WB I1 compute I2 IF ID s if IF I3 branch I4 is taken If we predicted I5 incorrectly these I6 instructions MUST CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB 880 aac 0 9 ccc 900 uu rra att ee Branch predictors cache branch history Address of BNEZ instruction 0b0110 01001000 28 bits 2 bits Branch Target Buffer BTB 28 bit address tag target address BNEZ R1 Loop Branch History Table Update BHT Update BHT BT BHT BT B B for for next next time time PC 4 Loop 0b0110 0100 once once true true behavio behavio Taken Taken rr known Taken Taken known or Address or Address Hit kill instruction Hit Must check prediction Not CS 152 L17 Advanced Processors I UC Regents Fall 2005 UCB Simple 2 bit Branch History Table Entry Prediction for next …


View Full Document

Berkeley COMPSCI 152 - Lecture 17 – Advanced Processors I

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 17 – Advanced Processors I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 17 – Advanced Processors I and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?