DOC PREVIEW
Berkeley COMPSCI 152 - IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

40IBM introduced Power4-based sys-tems in 2001.1The Power4 design integratestwo processor cores on a single chip, a sharedsecond-level cache, a directory for an off-chipthird-level cache, and the necessary circuitryto connect it to other Power4 chips to form asystem. The dual-processor chip provides nat-ural thread-level parallelism at the chip level.Additionally, the Power4’s out-of-order exe-cution design lets the hardware bypass instruc-tions whose operands are not yet available(perhaps because of an earlier cache miss dur-ing register loading) and execute other instruc-tions whose operands are ready. Later, whenthe operands become available, the hardwarecan execute the skipped instruction. Coupledwith a superscalar design, out-of-order exe-cution results in higher instruction executionparallelism than otherwise possible.The Power5 is the next-generation chip inthis line. One of our key goals in designingthe Power5 was to maintain both binary andstructural compatibility with existing Power4systems to ensure that binaries continue exe-cuting properly and all application optimiza-tions carry forward to newer systems. Withthat base requirement, we specified increasedperformance and other functional enhance-ments of server virtualization, reliability,availability, and serviceability at both chip andsystem levels. In this article, we describe theapproach we used to improve chip-levelperformance.MultithreadingConventional processors execute instruc-tions from a single instruction stream. Despitemicroarchitectural advances, execution unitutilization remains low in today’s micro-processors. It is not unusual to see average exe-cution unit utilization rates of approximately25 percent across a broad spectrum of envi-ronments. To increase execution unit utiliza-tion, designers use thread-level parallelism, inwhich the physical processor core executesinstructions from more than one instructionstream. To the operating system, the physicalprocessor core appears as if it is a symmetricmultiprocessor containing two logical proces-sors. There are at least three different meth-ods for handling multiple threads.In coarse-grained multithreading, only oneRon KallaBalaram SinharoyJoel M. TendlerIBMFEATURING SINGLE- AND MULTITHREADED EXECUTION, THEPOWER5PROVIDES HIGHER PERFORMANCE IN THE SINGLE-THREADED MODE THAN ITSPOWER4 PREDECESSOR AT EQUIVALENT FREQUENCIES. ENHANCEMENTSINCLUDE DYNAMIC RESOURCE BALANCING TO EFFICIENTLY ALLOCATE SYSTEMRESOURCES TO EACH THREAD, SOFTWARE-CONTROLLED THREADPRIORITIZATION, AND DYNAMIC POWER MANAGEMENT TO REDUCE POWERCONSUMPTION WITHOUT AFFECTING PERFORMANCE.IBM POWER5 CHIP: A DUAL-COREMULTITHREADEDPROCESSORPublished by the IEEE Computer Society 0272-1732/04/$20.00  2004 IEEEthread executes at any instance. When athread encounters a long-latency event, suchas a cache miss, the hardware swaps in a sec-ond thread to use the machine’s resources,rather than letting the machine remain idle.By allowing other work to use what otherwisewould be idle cycles, this scheme increasesoverall system throughput. To conserveresources, both threads share many systemresources, such as architectural registers.Hence, swapping program control from onethread to another requires several cycles. IBMimplemented coarse-grained multithreadingin the IBM eServer pSeries Model 680.2A variant of coarse-grained multithreadingis fine-grained multithreading. Machines ofthis class execute threads in successive cycles,in round-robin fashion.3Accommodating thisdesign requires duplicate hardware facilities.When a thread encounters a long-latencyevent, its cycles remain unused.Finally, in simultaneous multithreading(SMT), as in other multithreaded implemen-tations, the processor fetches instructionsfrom more than one thread.4What differen-tiates this implementation is its ability toschedule instructions for execution from allthreads concurrently. With SMT, the systemdynamically adjusts to the environment,allowing instructions to execute from eachthread if possible, and allowing instructionsfrom one thread to utilize all the executionunits if the other thread encounters a long-latency event.The Power5 design implements two-waySMT on each of the chip’s two processor cores.Although a higher level of multithreading ispossible, our simulations showed that theadded complexity was unjustified. As design-ers add simultaneous threads to a single phys-ical processor, the marginal performancebenefit decreases. In fact, additional multi-threading might decrease performance becauseof cache thrashing, as data from one threaddisplaces data needed by another thread.Power5 system structureFigure 1 shows the high-level structures ofPower4- and Power5-based systems. ThePower4 handles up to a 32-way symmetricmultiprocessor. Going beyond 32 processorsincreases interprocessor communication,resulting in high traffic on the interconnectionfabric. This can cause greater contention andnegatively affect system scalability. Moving thelevel-three (L3) cache from the memory side tothe processor side of the fabric lets the Power5more frequently satisfy level-two (L2) cachemisses with hits in the 36-Mbyte off-chip L3cache, avoiding traffic on the interchip fabric.References to data not resident in the on-chipL2 cache cause the system to check the L3cache before sending requests onto the inter-connection fabric. Moving the L3 cache pro-vides significantly more cache on the processorside than previously available, thus reducingtraffic on the fabric and allowing Power5-basedsystems to scale to higher levels of symmetricmultiprocessing. Initial Power5 systems sup-port 64 physical processors.The Power4 includes a 1.41-Mbyte on-chipL2 cache. Power4+ chips are similar in designto the Power4 but are fabricated in 130-nmtechnology rather than the Power4’s 180-nmtechnology. The Power4+ includes a 1.5-Mbyte on-chip L2 cache, whereas the Power541MARCH–APRIL 2004Processor ProcessorFabriccontrollerMemorycontrollerMemoryL2cacheL3cacheL3cacheProcessor ProcessorFabriccontrollerMemorycontrollerMemoryL2cacheL3cacheProcessor ProcessorFabriccontrollerMemoryL2cacheL3cacheProcessor ProcessorFabriccontrollerMemoryL2cache(a)(b)MemorycontrollerMemorycontrollerFigure 1. Power4 (a) and Power5 (b) system structures.supports a 1.875-Mbyte on-chip L2 cache.Power4 and Power4+ systems both have 32-Mbyte L3 caches, whereas Power5 systemshave a 36-Mbyte L3 cache.The L3 cache operates as a backdoor withseparate buses for reads and writes that oper-ate at half


View Full Document

Berkeley COMPSCI 152 - IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?