CMU CS 15740 - Intel Discloses New IA-64 Features - D2010186

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15740> Intel Discloses New IA-64 Features

DOC PREVIEW

CMU CS 15740 - Intel Discloses New IA-64 Features

School name Carnegie Mellon University

Course Cs 15740- Computer Architecture

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

©MICRODESIGN RESOURCES MARCH 8, 1999 MICROPROCESSOR REPORTby Linley GwennapIn a series of talks at the recent Intel Developers Forum,the company tantalized industry watchers by dribbling out afew more details about its IA-64 instruction set and its firstimplementation, Merced. In a joint presentation by Intel’sJohn Crawford and Hewlett-Packard’s Jerry Huck, the twoarchitects shed additional light on the IA-64 design. Theyprovided further details on the architecture’s support forpredication and speculation and also described IA-64’sbranch architecture. A newly disclosed feature, rotatingregisters, provides an efficient way to unroll loops whileminimizing code expansion.In other talks, Intel disclosed that Merced and its firstchip set, the 460GX, will support high-availability featuresrequired in large servers. The company asserts that four-processor Merced servers will deliver more performance onthe TPC-C benchmark than four-way servers using 1-GHzAlpha 21264 processors or 750-MHz UltraSparc-3 proces-sors, two key Merced rivals that are expected to ship nextyear. But it has yet to disclose any details about clock speed,bus bandwidth, or other metrics to support this position.Register Renaming Implemented in SoftwareOne of the key philosophies of IA-64 is the idea of movingcomplexity from the hardware to the software. Register re-naming is one example. Most high-end processors map asmall number (8–32) of logical registers onto a larger set ofphysical registers (up to 80 in the case of the 21264).Becausesoftware can access only the logical registers, the hardwaremust assign mappings and translate accesses using an associa-tive lookup table. This complexity increases die size and oftenthe pipeline depth as well.IA-64 eliminates this hardware complexity with itslarge register file (128 integer, 128 floating-point) that isdirectly accessible by software.Specifying the physical regis-ter names in software works well except in the case of tightloops,a common occurrence. In these short code sequences,there may not be enough instructions in the loop to cover thelatency of load instructions, resulting in unwanted stalls.An out-of-order processor reorders instructions tocover the latency of the loads. The reordering naturally over-laps instructions from two or more iterations of the loopuntil enough instructions are found to overcome the latency(or the hardware runs out of resources). This overlap willcause register conflicts, since each loop iteration referencesthe same registers, but these conflicts are resolved by hard-ware register renaming.An IA-64 processor can address the latency problem byunrolling the loop in software.This common compiler tech-nique duplicates the loop instructions,often several times,togenerate enough instructions to cover the load latencies.Each duplicate set of instructions, however, must use a dif-ferent set of registers to avoid collisions. IA-64 has plenty ofregisters available,but all of these duplicate instructions cancreate massive code expansion.Rotating Registers Compact CodeTo reduce code expansion, IA-64 uses its rotating registers.With this technique, the upper three-quarters of each regis-ter file (integer,FP,and predicates) rotates,leaving the lowerregisters for global variables.Accesses to these upper registersare offset by the value in the corresponding RRB (rotatingregister base) register. A special instruction, BR.CTOP,decre-ments each of the RRBs by one at the end of each loop itera-tion, allowing the next iteration to use a new set of physicalregisters. (With proper spacing, several variables can berotated through the register file at once.)The rotating predicate registers provide a simple way tohandle loop setup (prologue) and termination (epilogue). Ifthe prologue and epilogue instructions are appropriatelypredicated, and the predicate registers rotated, the prologueinstructions are executed only during the initial iteration(s)of the loop, and the epilogue instructions are executed onlyIntel Discloses New IA-64 FeaturesRotating Registers Reduce Code Expansion; Merced Touted for Big Servers(a) PA-RISC with hardware reordering; Set up r2=loop count, r10=source addr, r11=destination addrloop: LDWM r1, (r10) ; Load into r1, inc addrSTWM (r11), r1 ; Store from r1, inc addrADDIB,> r2, -1, loop ; Decr loop count and branchMEMCPY LOOP: for (i=0; i<n; i++) {*b++ = *a++}(b) IA-64 with rotating registers; Set up LC=loop count–1, r10=source addr, r11=destination addr; Clear predicate registers, set p16, set EC=epilogue countloop: (p16) LD8 r34 = [r10], 8 ; Load into "r34," inc addr(p17) ST8 [r11] = r35, 8 ; Store from previous "r34," inc addrBR.CTOP loop ; Decr loop count and branchFigure 1. In a simple memory-copy loop, a PA-RISC processor with hardware reordering will cover the latency of the first load by launch-ing subsequent loads, creating multiple versions of “r1” using hardware renaming. Without adding instructions to the loop, an IA-64 proces-sor will accomplish the same effect by rotating its registers; in this case, “r35” refers to the previous iteration of “r34.”2 INTEL DISCLOSES NEW IA-64 FEATURES©MICRODESIGN RESOURCES MARCH 8, 1999 MICROPROCESSOR REPORTduring the final iteration(s) of the loop.Some setup is still re-quired to properly initialize the predicates, but this can bedone well in advance of beginning the loop, removing thissetup from the critical path.Eschewing an orthogonal register set, HP and Inteladded several special registers to implement this process.The64-bit LC (loop count) register performs its eponymousfunction.The 6-bit EC (epilogue count) register controls theexecution of epilogue instructions. Three RRBs (each 6 or 7bits) rotate the integer, FP, and predicate registers, as de-scribed above.The use of special registers allows the BR.CTOPinstruction to specify several operations at once, but in thecommon case of nested loops, register rotation can be usedin only one of the loops.This method of register renaming allows a single copyof the loop code to be unrolled in hardware rather than soft-ware, eliminating most of the code expansion, as Figure 1shows. Rotating the registers adds some complexity (a few7-bit registers and adders) to the hardware, but it adds farless than the fully generic renaming hardware in a reorderingCPU. The rotating register concept dates back to Cydrome’sCydra-5, one of the original VLIW processors; not coinci-dentally, its architect,Bob Rau, is now on staff

View Full Document