CS 250 VLSI System Design Lecture 10 Design Verification 2010 10 11 John Wawrzynek and Krste Asanovic with John Lazzaro TA Yunsup Lee www inst eecs berkeley edu cs250 CS 250 L10 Design Verification UC Regents Fall 2010 UCB 1 IBM Power 4 Macro 1 174 Million Transistors A complex design Unit A First silicon booted AIX Linux on a 16 die system FP IFU 96 of all bugs were caught before first tape out Chip How CS 250 L10 Design Verification UC Regents Fall 2010 UCB 2 Figure 1 Figure 2 Three main components 1 Specify chip behavior at the RTL level and comprehensively simulate it 3 Technology layer do the the electrons implement the RTL at speed and power CS 250 L10 Design Verification 2 Use formal verification to show equivalence between Verilog RTL and circuit schematic RTL Today we focus on 1 UC Regents Fall 2010 UCB 3 Lecture Focus Functional Design Test IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 testing goal g n i r ct u fa u n a The processor m t o N design s t s te correctly executes programs written in the Instruction Set Architecture y power voltage y current ge of the is used All core and bulk lt disilitance as ance and nd data k buffer and four e h t s et e m t c a tr ct n e r o r C C o i te ct s h c r A Intel XScale ARM Pipeline IEEE Journal of Solid State Circuits 36 11 November 2001 Fig 2 Microprocessor pipeline organization CS 250 L10 Design Verification shown in Fig 2 where the state boundaries are indicated by gray Features that allow the microarchitecture to achieve high speed are as follows The shifter and ALU reside in separate stages The ARM in UC Regents Fall 2010 UCB 4 Architect s Contract with the Programmer To the program it appears that instructions execute in the correct order defined by the ISA As each instruction completes the architected machine state appears to the program to obey the ISA What the machine actually does is up to the hardware designers as long as the contract is kept CS 250 L10 Design Verification UC Regents Fall 2010 UCB 5 Three models at least to cross check IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 The contract specification The answer correct we hope Simulates the ISA model in C Fast Better two models coded independently The Verilog RTL model Logical semantics of the Verilog model we will use to create gates Runs on a software simulator or FPGA hardware y power voltage y current ge of the is used All core and bulk lt disilitance as ance and nd data k buffer and four Chip level schematic RTL Catch synthesis bugs Formally verify netlist against Verilog RTL Also used for timing and power Fig 2 Microprocessor pipeline organization Where do bugs come from CS 250 L10 Design Verification shown in Fig 2 where the state boundaries are indicated by gray Features that allow the microarchitecture to achieve high speed are as follows The shifter and ALU reside in separate stages The ARM in UC Regents Fall 2010 UCB 6 y power voltage y current ge of the is used All core and bulk lt disilitance as ance and nd data k buffer and four Where bugs come from a partial list IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 The contract is wrong You understand the contract create a design that correctly implements it write correct Verilog for the design The contract is misread Your design is a correct implementation of what you think the contract means but you misunderstand the contract Conceptual error in design You understand the contract but devise an incorrect implementation of it Verilog coding errors You express your correct design idea in Verilog with incorrect Verilog semantics Verilog name misspellings latch implication combinational loops Fig 2 Microprocessor pipeline organization CS 250 L10 Design Verification shown in Fig 2 where the state boundaries are indicated by gray Features that allow the microarchitecture to achieve high speed are as follows The shifter and ALU reside in separate stages The ARM in UC Regents Fall 2010 UCB 7 Four Types of Testing CS 250 L10 Design Verification UC Regents Fall 2010 UCB 8 Big Bang Complete Processor Testing 1600 IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 Top down testing how it works complete processor testing Assemble the complete processor Fig 1 Execute test program suite on the processor Process SEM cross section The process was raised from 1 to limit standby power Circuit design and architectural pipelining ensure low voltage performance and functionality To further limit standby current in handheld ASSPs a longer poly target takes advantage of the versus dependence and source to body bias is used to electrically limit transistor in standby mode All core nMOS and pMOS transistors utilize separate source and bulk connections to support this The process includes cobalt disilicide gates and diffusions Low source and drain capacitance as well as 3 nm gate oxide thickness allow high performance and low voltage operation Check results Bottom up testing Checks contract model against Verilog RTL Test suite runs the gamut from 1 line programs to boot the OS III A CS 250 L10 Design Verification RCHITECTURE The microprocessor contains 32 kB instruction and data caches as well as an eight entry coalescing writeback buffer The instruction and data cache fill buffers have two and four entries respectively The data cache supports hit under miss Fig 2 Microprocessor pipeline organization shown in Fig 2 where the UC state boundaries are indicated by Regents Fall 2010 UCB gray Features that allow the microarchitecture to achieve high 9 speed are as follows The shifter and ALU reside in separate stages The ARM instruction set allows a shift followed by an ALU operation in a Methodical Approach Unit Testing 1600 IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 Top down testing how it works complete processor testing Remove a block from the design Fig 1 unit testing Bottom up testing Test it in isolation against specification Process SEM cross section The process was raised from 1 to limit standby power Circuit design and architectural pipelining ensure low voltage performance and functionality To further limit standby current in handheld ASSPs a longer poly target takes advantage of the versus dependence and source to body bias is used to electrically limit transistor in standby mode All core nMOS and pMOS transistors utilize separate source and bulk connections to support this The process includes cobalt disilicide gates and
View Full Document