NYU CSCI-GA 2243 - Instruction Set Architecture Pipelining

Unformatted text preview:

2/1/2006 37G22.2243-001High Performance Computer ArchitectureLecture 2Instruction Set ArchitecturePipelining2/1/2006 38Recap• The most accurate measure of performance is the execution time of representative real programs or collections of programs (benchmarks)• Computer Architecture is an iterative process: – Identify bottlenecks and search the possible design space– Innovate and make selections – Implement/simulate selections made and evaluate the impact • Make the common case fast (recall the Amdahl’s Law)•RISC: Reduced Instruction Set Computers• Identify most frequently-used instructions– Implement them in hardware• Emulate other instructions (slowly) in software– Pretty much every technique used in current-day microprocessors2/1/2006 39Outline• Instruction set principles– What is an instruction set?– What is a good instruction set?– Instruction set aspects– RISC vs. CISCInstruction set examples: Appendices C, D, E, & F (online)• Pipelining– Why pipelining?– Basic stages of a pipeline– Expected improvement– Complications?[ Hennessy/Patterson CA:AQA (3rdEdition): Chapter 2 & Appendix A]2/1/2006 40Instruction Set Principles[ Hennessy/Patterson CA:AQA (3rdEdition): Chapter 2 & Appendix A]Instruction Set Architecture (ISA) “Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine.” Source: IBM in 1964 when introducing the IBM 360 architecture• An instruction set is a functional description of the processor– What operations can it do– What storage mechanisms does it support• ISA defines the hardware/software interfaceA good interface:– Lasts through many implementations – Can be used in many different ways – Provides convenient functionality to higher levels – Permits an efficient implementation at lower levelsInstruction Set Design Issues• What operations are supported? – add, sub, mul, move, compare . . .• Where are operands stored?– Registers (how many of them are there), memory, stack, accumulator• How many explicit operands are there? – 0, 1, 2, or 3 • How is the operand location specified?– register, immediate, indirect, . . . • What type and size of operands are supported?– byte, int, float, double, string, vector, . . .2/1/2006 43A "Typical" RISC• 32-bit fixed format instruction (3 formats)• Memory access only through load/store operations• 32 32-bit general-purpose registers – R0 contains zero– Double precision operations take pair (floating point registers may be separate)• 3-address (src1, src2, dst), register-register arithmetic instructions• Single address mode for load/store: base + displacement– no indirection• Simple branch conditions• Delayed branch•Examples: SUN SPARC, MIPS, HP PA-RISC, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-32/1/2006 44Example: MIPSOp31 26 01516202125Rs1 Rs2 Rd OpxRegister-Register561011Op31 26 01516202125Rs1 RdimmediateRegister-Immediate (e.g., load/store)Op31 26 01516202125Rs1 Rs2/OpximmediateBranchOp31 26 025PC-region target addressJump / CallISA Metrics• Orthogonality– No special registers, few special cases, all operand modes available with any data type or instruction type• Completeness– Support for a wide range of operations and target applications• Regularity– No overloading for the meanings of instruction fields• Streamlined– Resource needs easily determined• Ease of compilation (or assembly language programming)• Ease of implementationCloser look at ISA Aspects• Operand location• Addressing modes• Types of instructionsISA Aspect (1): Operand LocationAccumulator (before 1960):1 address add A acc ← acc + mem[A]Stack (1960s to 1970s):0 address add tos ← tos + nextMemory-Memory (1970s to 1980s):2 address add A, B mem[A] ← mem[A] + mem[B]3 address add A, B, C mem[A] ← mem[B] + mem[C]Register-Memory (1970s to present):2 address add R1, A R1 ← R1 + mem[A]load R1, A R1 ← mem[A]Register-Register, also called Load/Store (1960s to present):3 address add R1, R2, R3 R1 ← R2 + R3load R1, R2 R1 ← mem[R2]store R1, R2 mem[R1] ← R22/1/2006 48Choices for Operand Location• Running example: C:= A + B• Accumulatorload A accum = M[A];add B accum += M[B];store C M[C] = accum;+ Less hardware, code density– Memory bottleneck• Stackpush A S[++tos] = M[A];push B S[++tos] = M[B]add t1= S[tos--]; t2= S[tos--]; S[++tos]= t1 + t2;pop C M[C] = S[tos--];+ Less hardware, code density– Memory, pipelining bottlenecks– x86 uses stack model for floating point computations2/1/2006 49• Running example: C:= A + B•Memory-Memoryadd C, A, B M[C] = M[A] + M[B];+ Code density (most compact)– Memory bottleneck– No current machines support memory-memory (VAX did)• Memory-Registerload R1, A R1 = M[A];add R1, B R1 += M[B];store C, R1 M[C] = R1;+ Like several explicit (extended) accumulators+ Code density, easy to decode– Asymmetric operands, different amount of work per instruction– Examples: IBM 360/370, x86, Motorola 68KChoices for Operand Location (cont’d)2/1/2006 50• Running example: C:= A + B• Register-Register (Load-Store)load R1, A R1 = M[A];load R2, B R2 = M[B];add R3, R1, R2 R3 = R1 + R2;store C, R3 M[C] = R3;+ Easy decoding, operand symmetry+ Deterministic cost for ALU operations (simple cost model)+ Scheduling opportunities– Code densityChoices for Operand Location (cont’d)2/1/2006 51Operand Location: Registers vs. Memory• Pros and cons of registers+ Faster, direct access+ Simple cost model (fixed latency, no misses)+ Short identifier– Must save/restore on procedure calls, context switches– Fixed size (larger-sized structures must live in memory)• Pros and cons of more registers+ Possible to keep more operands for longer in faster memory• Shorter operand access time, lower memory traffic– Longer specifiers– Larger cost for saving CPU state– Trend towards more registers• 8 (x86) -> 32 (MIPS/Alpha/PPC) -> 128 (IA-64)• Driven by increasing compiler involvement in scheduling2/1/2006 52ISA Aspect (2): Addressing• Endian-ness: Order of bytes in words–Big: byte at lowest address has the most significance (big end) • E.g., IBM, Sun SPARC– Little: bytes at lower address have lower significance (little end)• E.g., x86– Some processors allow mode to be selectable• E.g., PowerPC, MIPS (new implementations of


View Full Document

NYU CSCI-GA 2243 - Instruction Set Architecture Pipelining

Download Instruction Set Architecture Pipelining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Instruction Set Architecture Pipelining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Set Architecture Pipelining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?