NYU CSCI-GA 2243 - Instruction Set Architecture - D2694636

Home> Schools> New York University> Computer Science (CSCI-GA) > CSCI-GA 2243> Instruction Set Architecture

NYU CSCI-GA 2243 - Instruction Set Architecture

Course Csci-Ga 2243- High Performance Computer Arch

Pages 48

Download Save

Unformatted text preview:

9/18/2007 1G22.2243-001High Performance Computer ArchitectureLecture 2Instruction Set ArchitecturePipeliningExec. timenew= Exec. timeoldx [ (1 – f ) + f/s ] Speedup (E) = Exec. Timeold= 1 ≤ 1Exec. Timenew[1 – f + f/s ] (1 – f)Exec. timenew= Exec. timeoldx [ (1 – f ) + f/s ] Speedup (E) = Exec. Timeold= 1 ≤ 1Exec. Timenew[1 – f + f/s ] (1 – f)From Last Week: Amdahl's Law• Speedup due to enhancement E:• Suppose that enhancement E accelerates a fraction f of the task by a factor s, and the remainder of the task is unaffected• New execution time and the overall speedup?Speedup (E) = Execution time without E = Performance with EExecution time with E Performance without ESpeedup (E) = Execution time without E = Performance with EExecution time with E Performance without EExample of Amdahl’s Law• Floating point instructions improved to run 2x; but only 10% of the time was spent on these instructions• How much improvement in performance should one expect?• The new machine is 5.3% faster for this mix of instructionsExec. timenew= Exec. timeoldx [ (1 – f ) + f/s ] Speedup (E) = Exec. Timeold= 1 ≤ 1Exec. Timenew[1 – f + f/s ] (1 – f)Exec. timenew= Exec. timeoldx [ (1 – f ) + f/s ] Speedup (E) = Exec. Timeold= 1 ≤ 1Exec. Timenew[1 – f + f/s ] (1 – f)Exec. timenew= Exec. timeoldx [ (1 – 0.1) + 0.1/2 ] = Exec. timeoldx 0.95Speedup (E) = Exec. Timeold= 1 = 1.053Exec. Timenew0.959/18/2007 4Recap• The most accurate measure of performance is the execution time of representative real programs or collections of programs (benchmarks)• Computer Architecture is an iterative process: – Identify bottlenecks and search the possible design space– Innovate and make selections – Implement/simulate selections made and evaluate the impact • Make the common case fast (recall the Amdahl’s Law)•RISC: Reduced Instruction Set Computers• Identify most frequently-used instructions– Implement them in hardware• Emulate other instructions (slowly) in software– Pretty much every technique used in current-day microprocessors9/18/2007 5Outline• Instruction set principles– What is an instruction set?– What is a good instruction set?– Instruction set aspects– RISC vs. CISCInstruction set examples: Appendix B• Pipelining– Why pipelining?– Basic stages of a pipeline– Expected improvement– Complications?[ Hennessy/Patterson CA:AQA (3rdEdition): Appendix B & Appendix A]9/18/2007 6Instruction Set Principles[ Hennessy/Patterson CA:AQA (4thEdition): Appendix B]Instruction Set Architecture (ISA) “Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine.” Source: IBM in 1964 when introducing the IBM 360 architecture• An instruction set is a functional description of the processor– What operations can it do– What storage mechanisms does it support• ISA defines the hardware/software interfaceA good interface:– Lasts through many implementations – Can be used in many different ways – Provides convenient functionality to higher levels – Permits an efficient implementation at lower levelsInstruction Set Design Issues• What operations are supported? – add, sub, mul, move, compare . . .• Where are operands stored?– Registers (how many of them are there), memory, stack, accumulator• How many explicit operands are there? – 0, 1, 2, or 3 • How is the operand location specified?– register, immediate, indirect, . . . • What type and size of operands are supported?– byte, int, float, double, string, vector, . . .9/18/2007 9A "Typical" RISC• 32-bit fixed format instruction (3 formats)• Memory access only through load/store operations• 32 32-bit general-purpose registers – R0 contains zero– Double precision operations take pair (floating point registers may be separate)• 3-address (src1, src2, dst), register-register arithmetic instructions• Single address mode for load/store: base + displacement– no indirection• Simple branch conditions• Delayed branch•Examples: SUN SPARC, MIPS, HP PA-RISC, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-39/18/2007 10Example: MIPSOp31 26 01516202125Rs1 Rs2 Rd OpxRegister-Register561011Op31 26 01516202125Rs1 RdimmediateRegister-Immediate (e.g., load/store)Op31 26 01516202125Rs1 Rs2/OpximmediateBranchOp31 26 025PC-region target addressJump / CallISA Metrics• Orthogonality– No special registers, few special cases, all operand modes available with any data type or instruction type• Completeness– Support for a wide range of operations and target applications• Regularity– No overloading for the meanings of instruction fields• Streamlined– Resource needs easily determined• Ease of compilation (or assembly language programming)• Ease of implementationCloser look at ISA Aspects• Operand location• Addressing modes• Types of instructionsISA Aspect (1): Operand LocationAccumulator (before 1960):1 address add A acc ← acc + mem[A]Stack (1960s to 1970s):0 address add tos ← tos + nextMemory-Memory (1970s to 1980s):2 address add A, B mem[A] ← mem[A] + mem[B]3 address add A, B, C mem[A] ← mem[B] + mem[C]Register-Memory (1970s to present):2 address add R1, A R1 ← R1 + mem[A]load R1, A R1 ← mem[A]Register-Register, also called Load/Store (1960s to present):3 address add R1, R2, R3 R1 ← R2 + R3load R1, R2 R1 ← mem[R2]store R1, R2 mem[R1] ← R29/18/2007 14Choices for Operand Location• Running example: C:= A + B• Accumulatorload A accum = M[A];add B accum += M[B];store C M[C] = accum;+ Less hardware, code density– Memory bottleneck• Stackpush A S[++tos] = M[A];push B S[++tos] = M[B]add t1= S[tos--]; t2= S[tos--]; S[++tos]= t1 + t2;pop C M[C] = S[tos--];+ Less hardware, code density– Memory, pipelining bottlenecks– x86 uses stack model for floating point computations9/18/2007 15• Running example: C:= A + B•Memory-Memoryadd C, A, B M[C] = M[A] + M[B];+ Code density (most compact)– Memory bottleneck– No current machines support memory-memory (VAX did)• Memory-Registerload R1, A R1 = M[A];add R1, B R1 += M[B];store C, R1 M[C] = R1;+ Like several explicit (extended) accumulators+ Code density, easy to decode– Asymmetric operands, different amount of work per instruction– Examples: IBM

View Full Document


School:
Email:
New Password:
Confirm Password:

NYU CSCI-GA 2243 - Instruction Set Architecture

Sign up for free to view:

Please select your school