DOC PREVIEW
CMU CS 15740 - mpr_ia64_demyst_jan98

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

©MICRODESIGN RESOURCES JANUARY 26, 1998 MICROPROCESSOR REPORTby Peter SongUsing a next-generation architecture technology thatIntel and Hewlett-Packard call EPIC (explicitly parallelinstruction computing),Merced and future EPIC processorsthreaten the performance lead held today by RISC proces-sors. EPIC is not entirely new, borrowing many of its ideasfrom previous RISC and VLIW designs as well as from recentacademic research. EPIC has an inherent performanceadvantage over existing architectures,however,because it is asynergistic assembly of the latest innovations into onearchitecture. To compete with EPIC processors from Intel,existing RISC architectures are likely to adopt a similar com-bination of EPIC features in their future versions.During last year’s Microprocessor Forum,Intel and HPgave a high-level,incomplete description of IA-64,for whichthe companies coined the generic name EPIC (see MPR10/27/97, p. 1). Nevertheless, we know that EPIC provides alarge number of addressable registers, eliminating the needfor register renaming and reducing cache accesses. It alsoprovides instruction dependency hints, simplifying instruc-tion issue logic. EPIC uses predicated execution to eliminatesome branches, thereby increasing scheduling freedom forthe compiler, allowing parallel execution of both paths ofbranches, and reducing opportunities for misprediction.EPIC uses speculative loads to enable well-behaved accessesto memory as soon as the address can be computed, hidingmemory latency.Intel and HP have revealed only a few details of EPICand IA-64, but we can project more details than publiclydisclosed by considering how these EPIC features can beapplied to solve today’s performance bottlenecks. IA-64may impose programming restrictions to accommodateclustering of execution units and registers, greatly simplify-ing hardware without unduly degrading the processor’sthroughput. It may also use delayed branches to specifybranch target addresses as early as possible, reducing re-liance on accurate branch prediction. IA-64 may use load/store instructions that also return the effective address as aresult, reducing the overhead of hoisting speculative loadsabove earlier stores.At first glance, retrofitting these EPIC features onto anexisting instruction set seems to require adding more bits,breaking binary compatibility with existing software.While afew new instructions can be added easily to an instruction setusing unused opcodes, adding general-purpose registers andpredicated execution seems more difficult, or even impossi-ble, without breaking binary compatibility. For many RISCarchitectures,however,most—if not all—of the known EPICfeatures can be added without breaking compatibility.EPIC isa natural evolution of RISC: its fixed-length instruction for-mats and load/store instructions enable the EPIC features tobe added easily.IA-64 Likely to Embrace Clustered DesignsIA-64 has 128 integer and 128 floating-point registers, fourtimes as many registers as a typical RISC architecture, allow-ing the compiler to expose and express an increased amountof ILP (instruction-level parallelism). Merced and futureIA-64 processors are expected to have more execution unitsthan today’s high-performance processors, taking advantageof the heightened ILP to deliver better performance.While additional registers and execution units canimprove a processor’s throughput, they generally degradethe processor’s cycle time, since a crossbar is needed betweenthe registers and the execution units in most general-purposeprocessors. The crossbar enables the execution units toaccess any register without interfering with each other and isbuilt into the register file. High-performance designs gener-ally use another crossbar for forwarding results from oneexecution unit to all units that may need the results, savingone or more cycles required for writing the results to the reg-ister file and then reading them.Adding registers or execution units increases the num-ber of switches and wires in the crossbars,as well as the wirelengths and the capacitive loading, resulting in longerdelays through the crossbars. Extra metal layers do notreduce a crossbar’s size or its propagation delays, since theswitches are built using transistors.Because wire delays takeDemystifying EPIC and IA-64EPIC Is a Natural Evolution of RISC, Making It Easy to Retrofit Onto RISCDataAccessUnitcluster 0GRs(0–31)IUs(address)cluster 1GRs(32-63)IUs(+ // &)cluster 2GRs(64-95)IUs(x ÷ MMX)cluster 3GRs(96-127)IUs(MMX)cluster 4FUs(MAD)FRs(0-31)cluster 5FUs(MAD)FRs(32-63)cluster 6FUs(MAD)FRs(64-95)cluster 7FUs(÷ sqrt)FRs(96-127)DataInstFetchUnitFigure 1. IA-64 processors may group registers and function unitsinto execution clusters, allowing implementations to use smallercrossbars and fewer global wires.©MICRODESIGN RESOURCES JANUARY 26, 1998 MICROPROCESSOR REPORT2 DEMYSTIFYING EPIC AND IA-64an increasingly large fraction of cycle times as processgeometries shrink—a trend that is unlikely to reverse in theforeseeable future—we expect new architectures,includingIA-64, to adopt features that require smaller crossbars andfewer global wires.IA-64 is likely to embrace partitioning the processorcore—registers and execution units—into clusters at thearchitectural level, reducing the burden of connecting theplethora of registers and execution units. For example, itcould partition the 128 registers into four 32-register banksand restrict most instructions to accessing registers fromonly one bank.Such a restriction would allow the processor core to bebuilt in clusters,as Figure 1 shows,each consisting of a bankof registers and a set of function units. Since the crossbars ineach cluster connect fewer registers and function units—resulting in fewer register-file ports and result-forwardingpaths—they are smaller and have shorter propagation delaysthan a crossbar connecting all registers and all functionunits.Using smaller crossbars, the processor core can operateat a higher clock speed without taking an extra cycle for thefunction units to forward their results to each other. Theremay be paths between the clusters for copying registers fromone bank to another, using explicit move instructions. Eachpath would add a write port to each register and possibly aresult-forwarding path to each execution unit.Digital’s 21264 (see MPR 10/28/96, p. 11) uses a clus-tered design in which each pair of integer and address-gener-ation units has its own copy


View Full Document

CMU CS 15740 - mpr_ia64_demyst_jan98

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download mpr_ia64_demyst_jan98
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view mpr_ia64_demyst_jan98 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view mpr_ia64_demyst_jan98 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?