Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2005 11 3 John Lazzaro www cs berkeley edu lazzaro TAs David Marquardt and Udam Saini www inst eecs berkeley edu cs152 CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB n stI 6 7 s rc 1 src1 val s rc 2 src2 val Last Time Dynamic Scheduling d e t s dest val Each line holds physical src1 src2 dest registers for an instruction and controls when it executes Reorder Buffer From Memory Load Unit ALU 1 ALU 2 Store Unit To Memory Common Common Data Data Bus Bus reg reg reg reg val val Execution Execution engine engine works works on on the the physical physical CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Today Throughput and multiple threads Goal Use multiple instruction streams to improve 1 throughput of machines that run many programs 2 execution time of multithreaded programs Example Sun Niagara 32 instruction streams on a chip Difficulties Gaining full advantage requires rewriting applications OS libraries Ultimate limiter Amdahl s law application dependent Memory system performance CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Throughput Computing Multithreading Interleave instructions from separate threads on the same hardware Seen by OS as several CPUs Multi core Integrating several processors that partially share a memory system on the same chip CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Multi Threading static pipelines CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Recall Bypass network prevents stalls Instead Instead of of bypass bypass interleave interleave threads threads on on the to the pipeline pipeline to prevent prevent stalls stalls ID Decode IR Mux Logic EX IR IR A Y M M MEM WE MemToReg WB IR From WB R B CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Introduced in 1964 by Seymour Cray 4 4 CPUs CPUs each each run run at at 1 4 1 4 clock clock CS 152 L19 Advanced Processors III Many Many variants variants UC Regents Fall 2005 UCB Multi Threading dynamic scheduling CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Power 4 predates Power 5 shown Tuesday Single threaded predecessor to Power 5 8 execution units in out of order engine each may issue an instruction each cycle CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB For most apps most execution units lie idle Observation Most hardware in an out of order CPU concerns physical registers Could several instruction threads share this hardware CS 152 L19 Advanced Processors III For an 8 way superscalar From Tullsen Eggers and Levy Simultaneous Multithreading Maximizing Onchip Parallelism ISCA 1995 UC Regents Fall 2005 UCB 1 2 3 4 5 6 7 8 9 Simultaneous Multi threading One thread 8 units Cycle M M FX FX FP FP BR CC Two threads 8 units Cycle M M FX FX FP FP BR CC 1 2 3 4 5 6 7 8 9 M Load Store FX Fixed Point FP Floating Point BR Branch CC Condition Codes CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Power Power 4 4 Power Power 5 5 2 2 fetch fetch PC PC 2 2 initial initial decodes CS 152 L19 Advanced Processors III 2 2 commits commits architecte architecte register register sets sets UC Regents Fall 2005 UCB Power 5 data flow Why Why only only 2 2 threads threads With With 4 4 one one of of the the shared shared resources resources physical physical registers registers cache cache memory memory bandwidth bandwidth would would be be prone prone to to CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Power 5 thread performance Relative Relative priority priority of of each each thread thread controllable controllable in in hardware hardware For For balanced balanced operation operation both both threads threads run run slower slower than than if if they they owned owned the the machine machine CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB This Friday Memory System Checkoff IM Bus IC Bus Instruction Cache T e s t V e c t o r s Run your test vector suite on the Calinx board display results on LEDs DM Bus DC Bus Data Cache CS 152 L19 Advanced Processors III D R A M C o n t r o l l e r DR AM UC Regents Fall 2005 UCB Multi Core CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Recall Superscalar utilization by a thread For an 8 way superscalar CS 152 L19 Advanced Processors III Observation In many cases the on chip cache and DRAM I O bandwidth is also underutilized by one CPU So let 2 cores share them UC Regents Fall 2005 UCB Most of Power 5 die is shared hardware Core Core 1 1 Shared Shared Component Component L2 L2 Cache Cache L3 L3 Cache Cache Control Control Core Core 2 2 CS 152 L19 Advanced Processors III DRAM DRAM Controller Controller UC Regents Fall 2005 UCB Core to core interactions stay on chip CS 152 L19 Advanced Processors III 1 1 Threads Threads on on two two cores cores that that use use shared shared libraries libraries conserve conserve L2 L2 memory memory 2 2 Threads Threads on on two two cores cores share share memory memory via via L2 L2 cache cache operations operations Much Much faster faster than than UC Regents Fall 2005 UCB Sun Niagara CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB The case for Sun s Niagara For an 8 way superscalar CS 152 L19 Advanced Processors III Observation Some apps struggle to reach a CPI 1 For throughput on these apps a large number of single issue cores is better than a few superscalars UC Regents Fall 2005 UCB Niagara 32 threads on one chip Die size 340 mm in 90 nm 8 8 cores cores Power 50 60 W Single issue Single issue 6 stage 6 stage pipeline pipeline 4 way 4 way multimultithreaded threaded Fast Fast crypto crypto support support Shared Shared resources resources 3MB 3MB on chip on chip cache cache 4 4 DDR2 DDR2 interfaces interfaces 32G 32G DRAM DRAM 20 20 Gb s Gb s 1 1 shared shared FP FP unit unit Sources Hot Chips via EE Times Infoworld GB Ethernet ports GB Ethernet Jports Schwartz weblog Sun COO CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Niagara status Release coming soon Source J Schwartz weblog Sun COO CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Cell The PS3 chip CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB L2 Cache 512 KB PowerPC PowerP PowerP Synergistic C C Processing manage manage Units ss SPUs the the 8 8 SPUs SPUs also also runs 2X area of Pentium 4 4GHz cycle time runs CS 152 L19 Advanced Processors III UC Regents Fall 2005 UCB Synergistic Processing Units SPUs 8 cores using


View Full Document

Berkeley COMPSCI 152 - Lecture 19 – Advanced Processors III

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 19 – Advanced Processors III and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 19 – Advanced Processors III and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?