CS 152 Spring 2011 Section 8 Christopher Celio University of California Berkeley Monday March 21 2011 Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Monday March 21 2011 Intel Core 2 Duo Penryn Vs NVidia GTX 280 Intel Core 2 Duo Penryn dual core 2007 45nm 410 million transistors 2GHz 3 or 6MB of cache 10 35 Watts 107mm2 each core is 22mm2 L2 SRAM is 6mm2 MB NVidia GTX 280 10 core 240 stream processors 2008 65nm 1 4 Billion transistors 576mm2 602 MHz core clock 236 Watts Monday March 21 2011 http 3dimensionaljigsaw wordpress com 2008 06 18 physics based games the new genre Grades Department guidelines Average GPA 2 7 3 1 Class Average 75 Class Standard Deviation 11 5 Homework 15 Labs 35 Quizzes 50 Monday March 21 2011 Quiz 3 superscalar pipelines inorder out of order out of order processors what are the di erent stages What is done in each stage e g what resources are allocated in decode register renaming explicit versus implicit register renaming designs when to allocate registers when to free registers ROBs instruction windows data in ROB versus data not in ROB versus split ROB instruction window designs branches and exceptions how are they handled Load Store Queues when can stores loads be red to memory VLIW software instruction re ordering loop unrolling software pipelining how code will get scheduled on di erent pipelines branch prediction BHTs BTBs 2 bit counters local history global history tournament branch predictors when can you make predictions When do you learn prediction was wrong Monday March 21 2011 Out of Order Processors lots of drawing on the board here Monday March 21 2011 Out of Order Control Complexity MIPS R10000 Control Logic SGI MIPS Technologies Inc 1995 March 14 2011 Monday March 21 2011 CS152 Spring 2011 7 Out of Order Processors Yeager The MIPS R10000 Superscalar Microprocesor IEE Micro 1996 Monday March 21 2011 Out of Order Processors Monday March 21 2011 OOO Styles Monday March 21 2011 Data in ROB Design HP PA8000 Intel Pentium Pro Core2 Duo Nehalem Register File holds only committed state Ins use exec op p1 src1 p2 src2 pd dest Reorder buffer Load Unit FU FU FU Store Unit data t1 t2 tn Commit t result On dispatch into ROB ready sources can be in regfile or in ROB dest copied into src1 src2 if ready before dispatch On completion write to dest field and broadcast to src fields On issue read from ROB src fields March 9 2011 Monday March 21 2011 CS152 Spring 2011 11 Unified Physical Register File MIPS R10K Alpha 21264 Intel Pentium 4 Sandy Bridge Rename all architectural registers into a single physical register file during decode no register values read Functional units read and write from single unified register file holding committed and temporary registers in execute Commit only updates mapping of architectural register to physical register no data movement Decode Stage Register Mapping Commited Register Mapping Unified Physical Register File Read operands at issue Write results at completion Functional Units March 9 2011 Monday March 21 2011 CS152 Spring 2011 12 DEC Alpha 21264 1996 1997 single core 4 way out of order highly speculative 7 stage up to 80 instructions in ight tournament branch predictor 15 2M transistors 6M for logic rest is caching history tables 350 nm 600 MHz 64KB I 64KB D on chip 1 to 16MB L2 o chip 314mm2 die fairly large Monday March 21 2011 DEC Alpha 21264 Monday March 21 2011 21264 Register Renaming Registers are renamed then instructions are inserted into the issue queue Map table backed up on every in ight insn Monday March 21 2011 21264 Register Renaming What hazards does renaming obviate In what situations is renaming useful If you had to choose between branch prediction and renaming which would you pick Monday March 21 2011 21264 Register Renaming What hazards does renaming obviate WAR WAW In what situations is renaming useful If you had to choose between branch prediction and renaming which would you pick Monday March 21 2011 21264 Register Renaming What hazards does renaming obviate WAR WAW In what situations is renaming useful Code with ILP and name dependencies loops If you had to choose between branch prediction and renaming which would you pick Monday March 21 2011 21264 Register Renaming What hazards does renaming obviate WAR WAW In what situations is renaming useful Code with ILP and name dependencies loops If you had to choose between branch prediction and renaming which would you pick Not much ILP within a basic block so renaming isn t too useful without branch prediction Monday March 21 2011 21264 Superscalar Execution 21264 couldn t t full bypassing into one clock cycle Instead they fully bypass within each of two clusters inter cluster bypass takes another cycle Monday March 21 2011 21264 Instruction Reordering As mentioned earlier 21264 uses explicit renaming as opposed to data in ROB design What does ROB hold Monday March 21 2011 Memory Ordering in the 21264 To execute the critical instruction path quickly want to execute loads ASAP Initially loads speculatively bypass stores On a misspeculation set a wait bit for that load s PC so it will behave conservatively from then on Clear wait bits periodically Monday March 21 2011 Speculation in the 21264 What does the 21264 speculate on Next I line way Branches indirect jumps Exceptions Load Store ordering Load hit miss Shortens hit time by a cycle Anything else Monday March 21 2011 Question Stores When are stores sent to memory at commit time Why are stores saved in a store bu er before commit time so they can be forwarded to dependent loads Monday March 21 2011 VLIW Very Long Instruction Word Int Op 1 Int Op 2 Mem Op 1 Mem Op 2 FP Op 1 FP Op 2 Two Integer Units Single Cycle Latency Two Load Store Units Three Cycle Latency Two Floating Point Units Four Cycle Latency Multiple operations packed into one instruction Each operation slot is for a fixed function Constant operation latencies are specified Architecture requires guarantee of Parallelism within an instruction no cross operation RAW check No data use before data ready no data interlocks March 14 2011 Monday March 21 2011 CS152 Spring 2011 25 Branch Predictors 26 Monday March 21 2011 Branch Predictors 2 bit predictor branch history table BHT a table of 2 bit predictors predicts taken not taken branch target bu er BTB predicts target typically a table of PC target pairs 27 Monday March 21 2011 L12 29 Branch Target Buffer BTB I Cache 2k entry direct mapped BTB PC can also
View Full Document
Unlocking...