UVA CS 4610 - Target Machine Architecture

Unformatted text preview:

CD_Ch05-P374514 [11:26 2009/2/25] SCOTT: Programming Language Pragmatics Page: 65 3–8675Target Machine ArchitectureProcessor implementations change over time, as people invent better waysof doing things, and as technological advances (e.g., increases in the number oftransistors that will fit on one chip) make things feasible that were not feasiblebefore. Processor architectures also change, for at least two reasons. Some techno-logical advances can be exploited only by changing the hardware/software inter-face, for example by increasing the number of bits that can be added or multipliedin a single instruction. In addition, experience with compilers and applicationsoften suggests that cer tain new instructions would make programs simpler orfaster. Occasionally, technological and intellectual trends converge to produce arevolutionary change in both architecture and implementation. We w ill discussfour such changes in Section5.4: the development of microprogramming in theearly 1960s, the development of the microprocessor in the early to mid-1970s, thedevelopment of RISC machines in the early 1980s, and the move to multithreadedand multicore processors in the first decade of the 21st century.Most of the discussion in this chapter, and indeed in the rest of the book, willassume that we are compiling for a single-threaded RISC (reduced instruction setcomputer) architecture. Roughly speaking, a RISC machine is one that sacrificesrichness in the instruction set in order to increase the number of instructions thatcan be executed per second. Where appropriate, we will devote a limited amountof attention to earlier, CISC (complex instr uction set computer) architectures. Themost popular desktop processor in the world—the x86—is a legacy CISC design,but RISC dominates among newer designs, and modern implementations of thex86 gener ally run fastest if compilers restrict themselves to a relatively simplesubset of the instr uction set. Within a modern x86 processor, a hardware “frontend” translates these instructions, on the fly, into a RISC-like internal format.In the first three sections below we consider the hierarchical organization ofmemory, the types (formats) of data found in memory, and the instructionsused to manipulate those data. The coverage is necessarily somewhat cursory andhigh-level; much more detail can be found in books on computer architectureor organization (e.g., Chapters 2 to 5 of Patterson and Hennessy’s outstandingtext [PH08]).Copyrightc 2009 by Elsevier Inc. All rights reserved. 65CD_Ch05-P374514 [11:26 2009/2/25] SCOTT: Programming Language Pragmatics Page: 66 3–86766 Chapter 5 Target Machine ArchitectureTypical access time Typical capacityRegisters 0.2–0.5 ns 256–1024 bytesPrimary (L1) cache 0.4–1 ns 32 K–256 K bytesSecondary (L2) cache 4–10 ns 1–8 M bytesTertiary (off-chip, L3) cache 10–50 ns 4–64 M bytesMain memory 50–500 ns 256 M–16 G bytesDisk 5–15 ms 80 G bytes and upTape 1–50 s effectively unlimitedFigure 5.1 The memory hierarchy of a workstation-class computer. Access times and capacitiesare approximate, based on 2008 technology. Registers must be accessed within a single clockcycle. Primary cache typically responds in 1 to 2 cycles; off-chip cache in more like 20 cycles. Mainmemory on a supercomputer can be as fast as off-chip cache; on a workstation it is typically muchslower. Disk and tape times are constrained by the movement of physical parts.We consider the interplay between architecture and implementation inSection5.4. As illustrative examples, we consider the widely used x86 andMIPS instruction sets. Finally, in Section5.5, we consider some of the issuesthat make compiling for modern processors a challenging task.5.1 The Memory HierarchyMemory on most machines consists of a numbered sequence of 8-bit bytes. It is notuncommon for modern workstations to contain several gigabytes of memory—much too much to fit on the same chip as the processor. Because memory isoff-chip (typically on the other side of a bus), getting at it is much slower thatgetting at things on-chip. Most computers therefore employ a memory hierarchy,in which things that are used more often are kept close at hand. A typical memoryEXAMPLE 5.1Memor y hierarchy stats hierarchy, with access times and capacities, is shown in Figure 5.1.Only three of the levels of the memory hierarchy—registers, memory, anddevices—are a visible part of the hardware/software interface. Compilers manageregisters explicitly, loading them from memory when needed and storing themback to memory when done, or when the registers are needed for something else.Caches are managed by the hardware. Devices are generally accessed only by theoperating system.Registers hold small amounts of data that can be accessed very quickly. A typi-cal RISC machine has two sets of registers, to hold integer and floating-pointoperands. It also has several special purpose registers, including the pr ogramcounter (PC) and the processor status reg ister. The program counter holds theaddress of the next instruction to be executed. It is incremented automaticallywhen fetching most instructions; branches work by changing it explicitly. Theprocessor status register contains a variety of bits of importance to the operatingsystem (privilege level, interrupt priority level, trap enable bits) and, on someCopyrightc 2009 by Elsevier Inc. All rights reserved.CD_Ch05-P374514 [11:26 2009/2/25] SCOTT: Programming Language Pragmatics Page: 67 3–8675.1 The Memory Hierarchy 67machines, a few bits of importance to the compiler writer. Principal among theseare condition codes, which indicate whether the most recent arithmetic or logicaloperation resulted in a zero, a negative value, and/or arithmetic overflow. (We w illconsider condition codes in more detail in Section5.3.2.)Because registers can be accessed every cycle, while memory, generally, cannot,good compilers expend a great deal of effort trying to make sure that the datathey need most often are in registers, and trying to minimize the amount of timespent moving data back and forth between registers and memory. We will consideralgorithms for register management in Section5.5.2.Caches are generally smaller but faster than main memory. They are designedto exploit locality: the tendency of most computer programs to access the same ornearby locations in memory repeatedly. By automatically moving the contents ofthese locations into cache, a


View Full Document

UVA CS 4610 - Target Machine Architecture

Download Target Machine Architecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Target Machine Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Target Machine Architecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?