CS 152 Computer Architecture and Engineering Lecture 3 From CISC to RISC Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http www eecs berkeley edu krste http inst eecs berkeley edu cs152 Last Time in Lecture 2 Stack machines popular to simplify High Level Language HLL implementation Algol 68 Burroughs B5000 Forth machines Occam Transputers Java VMs Java Interpreters General purpose register machines provide greater efficiency with better compiler technology or assembly coding Compilers can explicity manage fastest level of memory hierarchy registers Microcoding was a straightforward way to implement simple machines with low gate count But also allowed arbitrary instruction complexity as microcode stores grew Makes most sense when fast read only memory ROM significantly faster than read write memory RAM 1 31 2008 CS152 Spring 08 2 Microprogramming thrived in the Seventies Significantly faster ROMs than DRAMs core were available For complex instruction sets CISC datapath and controller were cheaper and simpler New instructions e g floating point could be supported without datapath modifications Fixing bugs in the controller was easier ISA compatibility across various models could be achieved easily and cheaply Except for the cheapest and fastest machines all computers were microprogrammed 1 31 2008 CS152 Spring 08 3 Writable Control Store WCS Implement control store in RAM not ROM MOS SRAM memories now became almost as fast as control store core memories DRAMs were 2 10x slower Bug free microprograms difficult to write User WCS provided as option on several minicomputers Allowed users to change microcode for each processor User WCS failed Little or no programming tools support Difficult to fit software into small space Microcode control tailored to original ISA less useful for others Large WCS part of processor state expensive context switches Protection difficult if user can change microcode Virtual memory required restartable microcode 1 31 2008 CS152 Spring 08 4 Microprogramming early Eighties Evolution bred more complex micro machines CISC ISAs led to need for subroutine and call stacks in code Need for fixing bugs in control programs was in conflict with read only nature of ROM WCS B1700 QMachine Intel i432 With the advent of VLSI technology assumptions about ROM RAM speed became invalid more complexity Better compilers made complex instructions less important Use of numerous micro architectural innovations e g pipelining caches and buffers made multiple cycle execution of reg reg instructions unattractive 1 31 2008 CS152 Spring 08 5 Microprogramming in Modern Usage Microprogramming is far from extinct Played a crucial role in micros of the Eighties DEC uVAX Motorola 68K series Intel 386 and 486 Microcode pays an assisting role in most modern micros AMD Athlon Intel Core 2 Duo IBM PowerPC Most instructions are executed directly i e with hard wired control Infrequently used and or complicated instructions invoke the microcode engine Patchable microcode common for post fabrication bug fixes e g Intel Pentiums load code patches at bootup 1 31 2008 CS152 Spring 08 6 From CISC to RISC Use fast RAM to build fast instruction cache of uservisible instructions not fixed hardware microroutines Can change contents of fast instruction memory to fit what application needs right now Use simple ISA to enable hardwired pipelined implementation Most compiled code only used a few of the available CISC instructions Simpler encoding allowed pipelined implementations Further benefit with integration In early 80s can fit 32 bit datapath small caches on a single chip No chip crossings in common case allows faster operation 1 31 2008 7 CS152 Spring 08 Horizontal vs Vertical Code Bits per Instruction Instructions Horizontal code has wider instructions Multiple parallel operations per instruction Fewer steps per macroinstruction Sparser encoding more bits Vertical code has narrower instructions Typically a single datapath operation per instruction separate instruction for branches More steps to per macroinstruction More compact less bits Nanocoding Tries to combine best of horizontal and vertical code 1 31 2008 CS152 Spring 08 8 Nanocoding P r e s C code U PC state next state address he c aROM code C t ALU A Reg rs s e In d nanoaddress o ec ALUi A Reg rs D nanoinstruction d ROM e r i data w d ar H MC68000 had 17 bit code containing either 10 bit jump or 9 bit Exploits recurring control signal patterns in code e g 0 0 nanoinstruction pointer Nanoinstructions were 68 bits wide decoded to give 196 control signals 1 31 2008 CS152 Spring 08 9 CDC 6600 Seymour Cray 1964 A fast pipelined machine with 60 bit words Ten functional units Floating Point adder multiplier divider Integer adder multiplier Hardwired control no microcoding Dynamic scheduling of instructions using a scoreboard Ten Peripheral Processors for Input Output a fast time shared 12 bit integer ALU Very fast clock 10MHz Novel freon based technology for cooling 1 31 2008 CS152 Spring 08 10 CDC 6600 Datapath Operand Regs 8 x 60 bit operand 10 Functional Units result Central Memory 128K words 32 banks 1 s cycle Address Regs 8 x 18 bit Index Regs 8 x 18 bit operand addr result addr 1 31 2008 IR Inst Stack 8 x 60 bit 11 CS152 Spring 08 CDC 6600 A Load Store Architecture Separate instructions to manipulate three types of reg All arithmetic and logic instructions are reg to reg 8 8 8 60 bit data registers X 18 bit address registers A 18 bit index registers B 6 opcode 3 3 3 i j k Ri Rj op Rk Only Load and Store instructions refer to memory 6 opcode 3 i 3 j 18 disp Ri M Rj disp Touching address registers 1 to 5 initiates a load 6 to 7 initiates a store very useful for vector operations 1 31 2008 CS152 Spring 08 12 CDC6600 Vector Addition loop B0 n JZE B0 exit A0 B0 a0 A1 B0 b0 X6 X0 X1 A6 B0 c0 B0 B0 1 jump loop load X0 load X1 store X6 Ai address register Bi index register Xi data register 1 31 2008 CS152 Spring 08 13 CDC6600 ISA designed to simplify high performance implementation Use of three address register register ALU instructions simplifies pipelined implementation No implicit dependencies between inputs and outputs Decoupling setting of address register Ar from retrieving value from data register Xr simplifies providing multiple outstanding memory accesses Software can schedule load of address register before use of value Can interleave independent instructions inbetween CDC6600 has multiple
View Full Document
Unlocking...