x86-64 Machine-Level Programming∗Randal E. BryantDavid R. O’HallaronSeptember 8, 2008Intel’s IA32 instruction set architecture (ISA), colloquially known as “x86”, is the dominant instruction for-mat for the world’s computers. IA32 is the platform of choice for most Windows, Linux, and, since 2006,even Macintosh computers. The ISA we use today was defined in 1985 with the introduction of the i386microprocessor, extending the 16-bit instruction set defined by the original 8086 to 32 bits. Even thoughsubsequent processor generations have introduced new instruction types and formats, many compilers, in-cluding GCC, have avoided using these features in the interest of maintaining backward compatibility.A shift is underway to a 64-bit version of the Intel instruction set. Originally developed by Advanced MicroDevices (AMD) and named x86-64, it is now supported by high end processors from AMD (who now callit AMD64) and by Intel, who refer to it as Intel64. Most people still refer to it as “x86-64,” and we followthis convention. (Some vendors have shortened this to simply “x64”). Newer versions of Linux and GCCsupport this extension. In making this switch, the developers of GCC saw an opportunity to also make useof some of the instruction-set features that had been added in more recent generations of IA32 processors.This combination of new hardware and revised compiler makes x86-64 code substantially different in formand in performance than IA32 code. In creating the 64-bit extension, the AMD engineers also adopted someof the features found in reduced-instruction set computers (RISC) [7] that made them the favored targets foroptimizing compilers. For example, there are now 16 general-purpose registers, rather than the performance-limiting eight of the original 8086. The developers of GCC were able to exploit these features, as well asthose of more recent generations of the IA32 architecture, to obtain substantial performance improvements.For example, procedure parameters are now passed via registers rather than on the stack, greatly reducingthe number of memory read and write operations.This document serves as a supplement to Chapter 3 of Computer Systems: A Programmer’s Perspective(CS:APP), describing some of the differences. We start with a brief history of how AMD and Intel arrivedat x86-64, followed by a summary of the main features that distinguish x86-64 code from IA32 code, andthen work our way through the individual features.∗Copyrightc 2005, 2008, R. E. Bryant, D. R. O’Hallaron. All rights reserved.11 History and Motivation for x86-64Over the more than twenty years since introduction of the i386, the capabilities of microprocessors havechanged dramatically. In 1985, a fully configured, high-end desktop computer had around 1 megabyteof random-access memory (RAM) and 50 megabytes of disk storage. Microprocessor-based “worksta-tion” systems were just becoming the machines of choice for computing and engineering professionals.A typical microprocessor had a 5-megahertz clock and ran around one million instructions per second.Nowadays, a typical high-end system has 2 gigabyte of RAM (40X increase), 1 terabyte of disk storage(20,000X increase), and a 4-gigahertz clock, running around 5 billion instructions per second (5000X in-crease). Microprocessor-based systems have become pervasive. Even today’s supercomputers are basedon harnessing the power of many microprocessors computing in parallel. Given these large quantitativeimprovements, it is remarkable that the world’s computing base mostly runs code that is binary compatiblewith machines that existed over 20 years ago.The 32-bit word size of the IA32 has become a major limitation in growing the capacity of microprocessors.Most significantly, the word size of a machine defines the range of virtual addresses that programs can use,giving a 4-gigabyte virtual address space in the case of 32 bits. It is now feasible to buy more than thisamount of RAM for a machine, but the system cannot make effective use of it. For applications that involvemanipulating large data sets, such as scientific computing, databases, and data mining, the 32-bit word sizemakes life difficult for programmers. They must write code using out-of-core algorithms1, where the datareside on disk and are explicitly read into memory for processing.Further progress in computing technology requires a shift to a larger word size. Following the tradition ofgrowing word sizes by doubling, the next logical step is 64 bits. In fact, 64-bit machines have been availablefor some time. Digital Equipment Corporation introduced its Alpha processor in 1992, and it becamea popular choice for high-end computing. Sun Microsystems introduced a 64-bit version of its SPARCarchitecture in 1995. At the time, however, Intel was not a serious contender for high-end computers, andso the company was under less pressure to switch to 64 bits.Intel’s first foray into 64-bit computers were the Itanium processors, based on the IA64 instruction set.Unlike Intel’s historic strategy of maintaining backward compatibility as it introduced each new generationof microprocessor, IA64 is based on a radically new approach jointly developed with Hewlett-Packard.Its Very Large Instruction Word (VLIW) format packs multiple instructions into bundles, allowing higherdegrees of parallel execution. Implementing IA64 proved to be very difficult, and so the first Itanium chipsdid not appear until 2001, and these did not achieve the expected level of performance on real applications.Although the performance of Itanium-based systems has improved, they have not captured a significantshare of the computer market. Itanium machines can execute IA32 code in a compatibility mode but notwith very good performance. Most users have preferred to make do with less expensive, and often faster,IA32-based systems.Meanwhile, Intel’s archrival, Advanced Micro Devices (AMD) saw an opportunity to exploit Intel’s misstepwith IA64. For years AMD had lagged just behind Intel in technology, and so they were relegated tocompeting with Intel on the basis of price. Typically, Intel would introduce a new microprocessor at aprice premium. AMD would come along 6 to 12 months later and have to undercut Intel significantly to1The physical memory of a machine is often referred to as core memory, dating to an era when each bit of a random-accessmemory was implemented with a magnetized ferrite core.2get any
View Full Document