Unformatted text preview:

Last Time Test plan for your project CS152 Computer Architecture and Engineering Top down testing Lecture 4 Timing complete processor testing 2004 09 07 processor testing with self checks Dave Patterson Which testing types are good for each epoch Epoch 1 unit testing early multi unit testing www cs berkeley edu patterson multi unit testing John Lazzaro www cs berkeley edu lazzaro unit testing www inst eecs berkeley edu cs152 Bottom up testing CS 152 L03 Testing Processors UC Regents Fall 2004 UCB later Epoch 2 Epoch 3 Epoch 4 processor testing with self checks processor testing with self checks complete processor testing multi unit testing multi unit testing unit testing unit testing verification processor testing with self checks diagnostics diagnostics diagnostics Time processor assembly complete correctly executes single instructions correctly executes short programs CS 152 L03 Testing Processors UC Regents Fall 2004 UCB 1 Outline Timing 2 Architects draw blocks 1600 IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 Circuit designers draw 0 1 2 3 4 A clocked logic circuit primer 5678 9 8 8 9 4 0 1 2 3 4 Team networking break 5678 9 8 8 9 4 4 4 Fig 1 Process SEM cross section The process was raised from 1 to limit standby power Circuit design and architectural pipelining ensure low voltage performance and functionality To further limit standby current in handheld ASSPs a longer poly target takes advantage of the versus dependence and source to body bias is used to electrically limit transistor in standby mode All core nMOS and pMOS transistors utilize separate source and bulk connections to support this The process includes cobalt disilicide gates and diffusions Low source and drain capacitance as well as 3 nm gate oxide thickness allow high performance and low voltage operation UC Regents Fall 2004 UCB 3 III ARCHITECTURE The microprocessor contains 32 kB instruction and data caches as well as an eight entry coalescing writeback buffer The instruction and data cache fill buffers have two and four entries respectively The data cache supports hit under miss operation and lines may be locked to allow SRAM like operation Thirty two entry fully associative translation lookaside buffers TLBs that support multiple page sizes are provided for both caches TLB entries may also be locked A 128 entry branch target buffer improves branch performance a pipeline deeper than earlier high performance ARM designs 2 3 A Pipeline Organization To obtain high performance the microprocessor core utilizes a simple scalar pipeline and a high frequency clock In addition to avoiding the potential power waste of a superscalar approach functional design and validation complexity is decreased at the expense of circuit design effort To avoid circuit design issues the pipeline partitioning balances the workload and ensures that Change no one pipeline stage is tight The main integer pipeline is seven stages memory operations follow an eight stage pipeline and when operating in thumb mode an extra pipe stage is inserted after the last fetch stage to convert thumb instructions into ARM instructions Since thumb mode instructions 11 are 16 b two instructions are fetched in parallel while executing thumb instructions A simplified diagram of the processor pipeline is Architects reach logic top down Rst Next State Combinational Logic next R wire R next G G next Y Logic is where they meet Fig 2 Microprocessor pipeline organization CS 152 L03 Testing Processors UC Regents Fall 2004 UCB shown in Fig 2 where the state boundaries are indicated by gray Features that allow the microarchitecture to achieve high speed are as follows The shifter and ALU reside in separate stages The ARM instruction set allows a shift followed by an ALU operation in a single instruction Previous implementations limited frequency 012 34 5 by having the shift and ALU in a single stage Splitting this operation reduces the critical ALU bypass path by approximately 1 3 The extra pipeline hazard introduced when an instruction is immediately followed by one requiring that the result be shifted is infrequent Decoupled Instruction Fetch A two instruction deep queue is implemented between the second fetch and instruction decode pipe stages This allows stalls generated later in the pipe to be deferred by one or more cycles in the earlier pipe stages thereby allowing instruction fetches to proceed when the pipe is stalled and also relieves stall speed paths in the instruction fetch and branch prediction units NAND stalls Gate While register depenDeferred register dependency dencies are checked in the RF stage stalls A due toBthese hazards Out are deferred until the X1 stage All the necessary operands are A A Out busses0 as 0the results 1 then captured from result forwarding are 0 1 1 returnedBto the register file B 1 minimize 0 1 the enOne of the major goals of the design was to 1 1 0wisdom ergy consumed to complete a given task Conventional has been that shorter pipelines are more efficient due to re Ba NOR Gate 4 A B C D Out A B Out A B For someVdddefinition A of performance Vdd Out B A 1 28 04 Is this structural Verilog UC Regents Fall 2004 UCB 67 1 8 Small number A B Out Out 0 0 1 of high performance 0 1 0 1 0 0 logic circuits 1 1 0 next R next Y next G CS 152 L02 Design as a Team Sport 4 012 34 5 EEs logicCMOS bottom up Basicreach Components Logic Gates Y assign next R rst 1 b1 change Y R assign next Y rst 1 b0 change G Y assign next G rst 1 b0 change R G More clocked logic circuits CS 152 L03 Testing Processors 8 8 8 8 8 1 6 2 88 8 1 A9 9 A1 B 8 1 8 78 7 8 8 C0B 8 8 8 6 2 8 1 A9 9 A 8 1 8 C0 Can you build a processor Out entirely out of NAND gates B UCB Spring 2004 Mor CS152 Kubiatowicz Lec3 33 CS 152 L02 Design as a Team Sport 5 1 28 04 UC Regents Fall 2004 UCB 6 Ideal versus Reality Flu Level V When input 0 1 output 1 0 but NOT instantly Design Refinement Informal System Requirement refinement increasing level of detail Initial Specification Intermediate Specification Final Architectural Description ent refinement increasing level of detail Logic Components Administrivia Team Networking Break Logic Synthesis bridges the gap rement Intermediate Specification of Implementation refinement increasing level of detail assign next R rst 1 b1 change Y R assign next Y rst Components 1 b0 change G Y Logic Components assign next G rstLogic 1 b0 change R G Final Internal Specification ion Mini Lab 2 this Friday 9 10 Remember to do the pre lab


View Full Document

Berkeley COMPSCI 152 - Lecture 4 – Timing

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 4 – Timing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 4 – Timing and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?