Unformatted text preview:

12Microprocessors continue on therelentless path to provide more performance.Every new innovation in computing—dis-tributed computing on the Internet, data min-ing, Java programming, and multimedia datastreams—requires more cycles and comput-ing power. Even traditional applications suchas databases and numerically intensive codespresent increasing problem sizes that drivedemand for higher performance.Design innovations, compiler technology,manufacturing process improvements, andintegrated circuit advances have been drivingexponential performance increases in micro-processors. To continue this growth in thefuture, Hewlett-Packard and Intel architectsexamined barriers in contemporary designsand found that instruction-level parallelism(ILP) can be exploited for further perfor-mance increases.This article examines the motivation, oper-ation, and benefits of the major features of IA-64. Intel’s IA-64 manual provides a com-plete specification of the IA-64 architecture.1Background and objectivesIA-64 is the first architecture to bring ILPfeatures to general-purpose microprocessors.Parallel semantics, predication, data specula-tion, large register files, register rotation, con-trol speculation, hardware exception deferral,register stack engine, wide floating-point expo-nents, and other features contribute to IA-64’sprimary objective. That goal is to expose,enhance, and exploit ILP in today’s applica-tions to increase processor performance.ILP pioneers2,3developed many of theseconcepts to find parallelism beyond tradi-tional architectures. Subsequent industry andacademic research4,5significantly extendedearlier concepts. This led to published worksthat quantified the benefits of these ILP-enhancing features and substantially improvedperformance.Starting in 1994, the joint HP-Intel IA-64architecture team leveraged this prior work andincorporated feedback from compiler andprocessor design teams to engineer a powerfulinitial set of features. They also carefullydesigned the instruction set to be expandable toaddress new technologies and future workloads.Architectural basicsA historical problem facing the designers ofcomputer architectures is the difficulty ofbuilding in sufficient flexibility to adapt tochanging implementation strategies. Forexample, the number of available instructionbits, the register file size, the number ofaddress space bits, or even how much paral-Jerry Huck Dale MorrisJonathan RossHewlett-PackardAllan KniesHans MulderRumi ZahirIntelADVANCES IN MICROPROCESSOR DESIGN, INTEGRATED CIRCUITS, ANDCOMPILER TECHNOLOGY HAVE INCREASED THE INTEREST IN PARALLELINSTRUCTION EXECUTION. A JOINTHP-INTEL TEAM DESIGNED THEIA-64PROCESSOR INSTRUCTION SET ARCHITECTURE WITH PARALLELISM IN MIND.0272-1732/00/$10.00  2000 IEEEINTRODUCING THEIA-64ARCHITECTURElelism a future implementation might employhave limited how well architectures can evolveover time.The Intel-HP architecture team designedIA-64 to permit future expansion by provid-ing sufficient architectural capacity:• a full 64-bit address space,• large directly accessible register files,• enough instruction bits to communicateinformation from the compiler to thehardware, and• the ability to express arbitrarily largeamounts of ILP.Figure 1 summarizes the register state; Figure2 shows the bundle and instruction formats.Register resourcesIA-64 provides 128 65-bit general registers;64 of these bits specify data or memoryaddresses and 1 bit holds a deferred exceptiontoken or not-a-thing (NaT) bit. The “Con-trol speculation” section provides more detailson the NaT bit.In addition to the general registers, IA-64contains • 128 82-bit floating-point registers,• space for up to 128 64-bit special-pur-pose application registers (used to sup-port features such as the register stack andsoftware pipelining),• eight 64-bit branch registers for functioncall linkage and return, and• 64 one-bit predicate registers that hold theresult of conditional expression evaluation.Instruction encodingSince IA-64 has 128 general and 128 float-ing-point registers, instruction encodings use7 bits to specify each of three registeroperands. Most instructions also have a pred-icate register argument that requires another6 bits. In a normal 32-bit instruction encod-ing, this would leave only 5 bits to specify theopcode. To provide for sufficient opcode spaceand to enable flexibility in the encodings, IA-64 uses a 128-bit encoding (called a bundle)that has room for three instructions.Each of the three instructions has 41 bitswith the remaining 5 bits used for the tem-plate. The template bits help decode and routeinstructions and indicate the location of stopsthat mark the end of groups of instructionsthat can execute in parallel.Distributing responsibilityTo achieve high performance, most modernmicroprocessors must determine instructiondependencies, analyze and extract availableparallelism, choose where and when to executeinstructions, manage all cache and predictionresources, and generally direct all other ongo-ing activities at runtime. Although intendedto reduce the burden on compilers, out-of-order processors still require substantial13SEPTEMBER–OCTOBER 200064 PRsr0128 GRsStaticStacked/rotatingp08 BRsr1r32r31r12764 + 1 bitr126f0128 FRsRotatingf1f32f31f12782 bitsf126Rotatingar0128 ARsar1ar12764 bitsar126p15 p16 p62 p63 1 bitb0 b6 b7 64 bitsARBRFRGRPRApplication registerBranch registerFloating-point registerGeneral registerPredicate registerFigure 1. IA-64 application state.Instruction 2 Instruction 1 Instruction 0 Template41 bits 41 bits 41 bits 5 bits(a)Op Register 1 Register 2 Register 3 Predicate14 bits 7 bits 7 bits 7 bits 6 bits(b)Figure 2. IA-64 bundle (a) and instruction (b) formats.amounts of microarchitec-ture-specific compiler supportto achieve their fastest speeds.IA-64 strives to make thebest trade-offs in dividingresponsibility between whatthe processor must do at run-time and what the compilercan do at compilation time.ILPCompilers for all currentmainstream microprocessorsproduce code with the under-standing that regardless of howthe processor actually executesthose instructions, the resultswill appear to be executed oneat a time and in the exact orderthey were written. We refer tosuch architectures as havingsequential in-order executionsemantics, or simply sequen-tial semantics.Conforming to sequentialsemantics was easy to achievewhen microprocessors execut-ed instructions one at a timeand in their program-specifiedorder.


View Full Document

Rose-Hulman CSSE 332 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?