Berkeley COMPSCI 152 - IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR - D2883382

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR

DOC PREVIEW

Berkeley COMPSCI 152 - IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

40IBM introduced Power4-based sys-tems in 2001.1The Power4 design integratestwo processor cores on a single chip, a sharedsecond-level cache, a directory for an off-chipthird-level cache, and the necessary circuitryto connect it to other Power4 chips to form asystem. The dual-processor chip provides nat-ural thread-level parallelism at the chip level.Additionally, the Power4’s out-of-order exe-cution design lets the hardware bypass instruc-tions whose operands are not yet available(perhaps because of an earlier cache miss dur-ing register loading) and execute other instruc-tions whose operands are ready. Later, whenthe operands become available, the hardwarecan execute the skipped instruction. Coupledwith a superscalar design, out-of-order exe-cution results in higher instruction executionparallelism than otherwise possible.The Power5 is the next-generation chip inthis line. One of our key goals in designingthe Power5 was to maintain both binary andstructural compatibility with existing Power4systems to ensure that binaries continue exe-cuting properly and all application optimiza-tions carry forward to newer systems. Withthat base requirement, we specified increasedperformance and other functional enhance-ments of server virtualization, reliability,availability, and serviceability at both chip andsystem levels. In this article, we describe theapproach we used to improve chip-levelperformance.MultithreadingConventional processors execute instruc-tions from a single instruction stream. Despitemicroarchitectural advances, execution unitutilization remains low in today’s micro-processors. It is not unusual to see average exe-cution unit utilization rates of approximately25 percent across a broad spectrum of envi-ronments. To increase execution unit utiliza-tion, designers use thread-level parallelism, inwhich the physical processor core executesinstructions from more than one instructionstream. To the operating system, the physicalprocessor core appears as if it is a symmetricmultiprocessor containing two logical proces-sors. There are at least three different meth-ods for handling multiple threads.In coarse-grained multithreading, only oneRon KallaBalaram SinharoyJoel M. TendlerIBMFEATURING SINGLE- AND MULTITHREADED EXECUTION, THEPOWER5PROVIDES HIGHER PERFORMANCE IN THE SINGLE-THREADED MODE THAN ITSPOWER4 PREDECESSOR AT EQUIVALENT FREQUENCIES. ENHANCEMENTSINCLUDE DYNAMIC RESOURCE BALANCING TO EFFICIENTLY ALLOCATE SYSTEMRESOURCES TO EACH THREAD, SOFTWARE-CONTROLLED THREADPRIORITIZATION, AND DYNAMIC POWER MANAGEMENT TO REDUCE POWERCONSUMPTION WITHOUT AFFECTING PERFORMANCE.IBM POWER5 CHIP: A DUAL-COREMULTITHREADEDPROCESSORPublished by the IEEE Computer Society 0272-1732/04/$20.00  2004 IEEEthread executes at any instance. When athread encounters a long-latency event, suchas a cache miss, the hardware swaps in a sec-ond thread to use the machine’s resources,rather than letting the machine remain idle.By allowing other work to use what otherwisewould be idle cycles, this scheme increasesoverall system throughput. To conserveresources, both threads share many systemresources, such as architectural registers.Hence, swapping program control from onethread to another requires several cycles. IBMimplemented coarse-grained multithreadingin the IBM eServer pSeries Model 680.2A variant of coarse-grained multithreadingis fine-grained multithreading. Machines ofthis class execute threads in successive cycles,in round-robin fashion.3Accommodating thisdesign requires duplicate hardware facilities.When a thread encounters a long-latencyevent, its cycles remain unused.Finally, in simultaneous multithreading(SMT), as in other multithreaded implemen-tations, the processor fetches instructionsfrom more than one thread.4What differen-tiates this implementation is its ability toschedule instructions for execution from allthreads concurrently. With SMT, the systemdynamically adjusts to the environment,allowing instructions to execute from eachthread if possible, and allowing instructionsfrom one thread to utilize all the executionunits if the other thread encounters a long-latency event.The Power5 design implements two-waySMT on each of the chip’s two processor cores.Although a higher level of multithreading ispossible, our simulations showed that theadded complexity was unjustified. As design-ers add simultaneous threads to a single phys-ical processor, the marginal performancebenefit decreases. In fact, additional multi-threading might decrease performance becauseof cache thrashing, as data from one threaddisplaces data needed by another thread.Power5 system structureFigure 1 shows the high-level structures ofPower4- and Power5-based systems. ThePower4 handles up to a 32-way symmetricmultiprocessor. Going beyond 32 processorsincreases interprocessor communication,resulting in high traffic on the interconnectionfabric. This can cause greater contention andnegatively affect system scalability. Moving thelevel-three (L3) cache from the memory side tothe processor side of the fabric lets the Power5more frequently satisfy level-two (L2) cachemisses with hits in the 36-Mbyte off-chip L3cache, avoiding traffic on the interchip fabric.References to data not resident in the on-chipL2 cache cause the system to check the L3cache before sending requests onto the inter-connection fabric. Moving the L3 cache pro-vides significantly more cache on the processorside than previously available, thus reducingtraffic on the fabric and allowing Power5-basedsystems to scale to higher levels of symmetricmultiprocessing. Initial Power5 systems sup-port 64 physical processors.The Power4 includes a 1.41-Mbyte on-chipL2 cache. Power4+ chips are similar in designto the Power4 but are fabricated in 130-nmtechnology rather than the Power4’s 180-nmtechnology. The Power4+ includes a 1.5-Mbyte on-chip L2 cache, whereas the Power541MARCH–APRIL 2004Processor ProcessorFabriccontrollerMemorycontrollerMemoryL2cacheL3cacheL3cacheProcessor ProcessorFabriccontrollerMemorycontrollerMemoryL2cacheL3cacheProcessor ProcessorFabriccontrollerMemoryL2cacheL3cacheProcessor ProcessorFabriccontrollerMemoryL2cache(a)(b)MemorycontrollerMemorycontrollerFigure 1. Power4 (a) and Power5 (b) system structures.supports a 1.875-Mbyte on-chip L2 cache.Power4 and Power4+ systems both have 32-Mbyte L3 caches, whereas Power5 systemshave a 36-Mbyte L3 cache.The L3 cache operates as a backdoor withseparate buses for reads and writes that oper-ate at half

View Full Document

Berkeley COMPSCI 152 - IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Berkeley COMPSCI 152 - IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR

Sign up for free to view:

Please select your school