DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 5 – Timing, Xilinx

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Combinational Logic Cell X X Clk X Cout D Hold Setup Q D Don t Care Don t Care X X X Clock to Q X Q delay per unit load Unknown Internal Delay Ccritical Setup Time Input must be stable BEFORE trigger clock edge Cout Hold Time Input must REMAIN stable after trigger clock edge Last Time Timing Analysis Logic Delay Clock to Q time 1600 IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 Combinational Cell symbol is fully specified by CS152 Computer Architecture and functional input output behavior truth table logic equation VHDL Engineering Input load factor of each input Output cannot change instantaneously at the trigger clock edge Similar to delay in logic gates two components Propagation delay from each input to each output for each transition THL A o Fixed Internal Delay Load dependent delay x load Lecture 5 Timing Xilinx Internal Clock to Q Load dependent Clock to Q 0 7 Linear model composes 2004 09 14 1 28 04 UCB Spring 2004 CS152 Kubiatowicz Lec3 9 4 John Lazzaro Fig 1 Clk Process SEM cross section The process was raised from 1 to limit standby power Circuit design and architectural pipelining ensure low voltage performance and functionality To further limit standby current in handheld ASSPs a longer poly target takes advantage of the Combination Logic versus dependence and source to body bias is used to electrically limit transistor in standby mode All core nMOS and pMOS transistors utilize separate source and bulk connections to support this The process includes cobalt disilicide gates and diffusions Low source and drain capacitance as well as 3 nm gate oxide thickness allow high performance and low voltage operation www cs berkeley edu lazzaro 5 60 7 89 5 0 7 8 Critical Path Cycle Time Clocking Methodology www inst eecs berkeley edu cs152 CS152 Kubiatowicz Lec3 10 UCB Spring 2004 Dave Patterson www cs berkeley edu patterson CS 152 L05 Timing Xilinx Design Notebook What is the smallest T that produces correct operation 1 28 04 Clk 012 34 5 Worst case CL delay limits T Fig 2 Microprocessor pipeline organization UC Regents Fall 2004 UCB CS 152 L05 Timing Xilinx Design Notebook UC Regents 2004 UCB Critical path the slowest path between anyFalltwo storage All storage elements are clocked by the same clock 1 2 devices III ARCHITECTURE edge shown in Fig 2 where the state boundaries are indicated by The microprocessor contains 32 kB instruction and data gray Features that allow the microarchitecture to achieve high Cycle time is a function of the critical path The combination logic blocks caches as well as an eight entry coalescing writeback buffer data cache fill tick buffers have two and four Inputs The areinstruction updatedand at each clock entries respectively The data cache supports hit under miss All outputs MUST be stable beforeto the clock tick operation and lines may be locked allownext SRAM like oper 1 28 04 ation Thirty two entry fully associative translation lookaside buffers TLBs that support multiple page sizes are provided for both caches TLB entries may also be locked A 128 entry branch target buffer improves performance a pipeline UCB branch Spring 2004 deeper than earlier high performance ARM designs 2 3 A Pipeline Organization To obtain high performance the microprocessor core utilizes a simple scalar pipeline and a high frequency clock In addition to avoiding the potential power waste of a superscalar approach functional design and validation complexity is decreased at the expense of circuit design effort To avoid circuit design issues the pipeline partitioning balances the workload and ensures that no one pipeline stage is tight The main integer pipeline is seven stages memory operations follow an eight stage pipeline and when operating in thumb mode an extra pipe stage is inserted after the last fetch stage to convert thumb instructions into ARM instructions Since thumb mode instructions 11 are 16 b two instructions are fetched in parallel while executing thumb instructions A simplified diagram of the processor pipeline is Today s Lecture More Project Topics Clocked logic timing wrap up speed are as follows The shifter and ALU reside in separate stages The ARM must bein greater than struction set allows a shift followed by an ALU operation in a Clock to Q Longest Path through Combination single instruction Previous implementations limited frequency by having the shift and ALU in a single stage Splitting this operation reduces the critical ALU bypass path by approximately 1 3 The extra pipeline hazard introduced when an instruction is CS152 Kubiatowicz 1 28 04 UCB Spring 2004 immediately followed by one requiring that the result be shifted Lec3 11 is infrequent Decoupled Instruction Fetch A two instruction deep queue is implemented between the second fetch and instruction decode pipe stages This allows stalls generated later in the pipe to be deferred by one or more cycles in the earlier pipe stages thereby allowing instruction fetches to proceed when the pipe is stalled and also relieves stall speed paths in the instruction fetch and branch prediction units Deferred register dependency stalls While register dependencies are checked in the RF stage stalls due to these hazards are deferred until the X1 stage All the necessary operands are then captured from result forwarding busses as the results are returned to the register file One of the major goals of the design was to minimize the energy consumed to complete a given task Conventional wisdom has been that shorter pipelines are more efficient due to re Logic Setup CS152 Kubiatowicz Lec3 12 Clocked Logic Timing The process was raised from 1 to limit standby power Circuit design and architectural pipelining ensure low voltage performance and functionality To further limit standby current in handheld ASSPs a longer poly target takes advantage of the versus dependence and source to body bias is used to electrically limit transistor in standby mode All core IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2 Delay Va Vout Vout A B General C L Cell Delay Model Design Notebook Delay Va Vout Vout A B Combinational Logic Cell X X X Cout X X Field Programmable Gate Arrays X Internal Delay X delay per unit load Ccritical Cout Combinational Cell symbol is fully specified by functional input output behavior CS 152 L05 Timing Xilinx Design Notebook truth table logic equation VHDL CS 152 L05 Timing Xilinx Design Notebook Input UC Regents Fall 2004 UCB 3 load factor of each input UC Regents Fall


View Full Document

Berkeley COMPSCI 152 - Lecture 5 – Timing, Xilinx

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 5 – Timing, Xilinx
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 5 – Timing, Xilinx and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5 – Timing, Xilinx 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?