Unformatted text preview:

MICROPROCESSOR REPORT MICROPROCESSOR REPORT THE INSIDERS GUIDE TO MICROPROCESSOR HARDWARE VOLUME 8 NUMBER 12 SEPTEMBER 12 1994 Digital Leads the Pack with 21164 First of Next Generation RISCs Extends Alpha s Performance Lead by Linley Gwennap Three years ago unbeknownst to its participants a race began among five major CPU vendors to bring to market the next generation of RISC technology Each realized that pushing the performance envelope beyond 200 SPECint92 would require aggressive superscalar dispatch and high clock rates While Hewlett Packard IBM MIPS Technologies and Sun struggled with more complicated designs Digital has emerged from the pack to take the checkered flag with its new 21164 design known internally as EV 5 The company didn t skimp on performance to hasten the chip s debut not only is the 21164 the first microprocessor to exceed 200 SPECint92 but it should reach a rating of 330 when running at 300 MHz according to Digital s estimates with an astounding SPECfp92 score of 500 These scores more than double those of any nonAlpha microprocessor shipping today and they should keep Alpha in the performance lead even when other vendors deploy their own next generation chips The design can issue four instructions per cycle into two integer units and two floating point units for a peak execution rate of 1 2 BIPS billion instructions per second It is the first microprocessor to include a secondary cache as well as primary caches on chip the unified level two L2 cache is 96K in size pushing the transistor count to 9 3 million another first Previously the most transistors on a general purpose microprocessor were the 3 6 million on the PowerPC 604 Digital already has working samples which achieve the performance noted above and plans to ship the 21164 in 1Q95 Although the Alpha chip can really fly it has a few drawbacks The processor s vast die size 298 mm2 transistor count and advanced 0 5 micron IC process push its estimated manufacturing cost to a towering 430 and Digital is quoting a shocking initial price of 2 937 for the 300 MHz part The 21164 breaks another less desirable record by dissipating nearly 50 W at its peak operating frequency Digital Leads the Pack with 21164 Vol 8 No 12 September 12 1994 Short Cycle Time Requires Two Level Cache With its 1 200 MIPS peak throughput the 21164 requires tremendous instruction and data bandwidth to feed its ravenous engine far more than could be supplied from external cache RAMs The design requires a large on chip cache to buffer the high bandwidth CPU from the lower bandwidth external world With the 0 5micron process Digital knew it could push the on chip cache well beyond the 32K used by most current RISC processors to 64K or even 128K But even with Digital s CMOS 5 process see 080504 PDF the design team could not create a large cache array that could return data in a single 3 3 ns clock cycle In a large array it simply takes too long for the address to propagate through the array and for the data to propagate back The best the designers could do was a 16K cache similar to the ones used in the 275 MHz 21064A which is also built in CMOS 5 This size didn t work for the 21164 because the data cache had to be dual ported doubling the die area of the array Thus the new processor includes two primary caches one for instructions one for data of 8K each that can be accessed in a single 3 3 ns cycle But the design needed more fast memory on chip than just 16K leading to the two level cache scheme The second level cache array is 96K in size and requires two cycles 6 7 ns to access due to its larger physical size Including cycles for tag access and level one refill the total cache miss penalty for the primary caches is six cycles 20 ns on an L2 hit An external L2 cache in contrast requires at least 25 ns to service an L1 miss in the 21064 Thus moving the L2 cache on chip reduces the cache miss penalty improving performance Putting the second level cache on the chip has additional benefits The 21164 uses a three way set associative L2 cache which increases the hit rate compared with the direct mapped L2 caches used by most processors It is difficult to implement set associative caches externally due to the high pin count required but this 1994 MicroDesign Resources MICROPROCESSOR REPORT difficulty is not an issue for on chip caches The two level organization allows a more efficient use of resources The large unified cache offers a higher hit rate than split caches of the same total size Because the L1 data cache must be dual ported to service two accesses per cycle the two level design also avoids the need for a large dual ported memory which would have required much more die area instead only the small primary data cache must be dual ported Finally incorporating a large cache on chip reduces the need for external cache Once the price of the 21164 comes down it will be feasible to include it in a midrange system with no external cache reducing system cost Digital believes that the performance reduction in this configuration should be less than 10 for many applications Doubling Instruction Bandwidth Figure 1 shows a block diagram of the 21164 Instructions are read in groups of four from the instruction cache and are placed into one of two four word buffers The dispatcher then issues as many instructions as possible from the current buffer it must however completely empty one buffer before moving on to the next For example if three instructions are issued on one cycle the fourth must be issued by itself on the next cycle To avoid this situation the architects defined a universal NOP instruction that the dispatcher will recognize and discard The compiler uses this NOP to pad odd groups of instructions avoiding single issued instructions Single issue will be more frequent however on code that has not been optimized for the 21164 The new chip doubles the issue rate of the 21064 which could issue two instructions per cycle among three Branch History 2K 2 Instruction TLB 48 entry Instruction Cache 8K System Bus IFC 128 128 Instr Buffers PC Unit Decoded Dual Integer Units Virtual Address Ext Cache Control Dispatch Logic Instructions FP Add Divide 128 FP Multiply 64 Two Port Data TLB 48 entry Merge Logic Level Two Cache 96K Dual Ported Data Cache 8K L2 Cache Control Figure 1 The 21164 can issue four instructions per cycle to two integer units and two floating point units The new processor is unusual in that it has a large secondary cache on chip


View Full Document

CMU CS 15740 - THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?