Berkeley COMPSCI 152 - Lecture 5 – Timing, Xilinx - D1652682

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> Lecture 5 – Timing, Xilinx

DOC PREVIEW

Berkeley COMPSCI 152 - Lecture 5 – Timing, Xilinx

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Combinational Logic Cell X X Clk X Cout D Hold Setup Q D Don t Care Don t Care X X X Clock to Q X Q delay per unit load Unknown Internal Delay Ccritical Setup Time Input must be stable BEFORE trigger clock edge Cout Hold Time Input must REMAIN stable after trigger clock edge Last Time Timing Analysis Logic Delay Clock to Q time 1600 IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 Combinational Cell symbol is fully specified by CS152 Computer Architecture and functional input output behavior truth table logic equation VHDL Engineering Input load factor of each input Output cannot change instantaneously at the trigger clock edge Similar to delay in logic gates two components Propagation delay from each input to each output for each transition THL A o Fixed Internal Delay Load dependent delay x load Lecture 5 Timing Xilinx Internal Clock to Q Load dependent Clock to Q 0 7 Linear model composes 2004 09 14 1 28 04 UCB Spring 2004 CS152 Kubiatowicz Lec3 9 4 John Lazzaro Fig 1 Clk Process SEM cross section The process was raised from 1 to limit standby power Circuit design and architectural pipelining ensure low voltage performance and functionality To further limit standby current in handheld ASSPs a longer poly target takes advantage of the Combination Logic versus dependence and source to body bias is used to electrically limit transistor in standby mode All core nMOS and pMOS transistors utilize separate source and bulk connections to support this The process includes cobalt disilicide gates and diffusions Low source and drain capacitance as well as 3 nm gate oxide thickness allow high performance and low voltage operation www cs berkeley edu lazzaro 5 60 7 89 5 0 7 8 Critical Path Cycle Time Clocking Methodology www inst eecs berkeley edu cs152 CS152 Kubiatowicz Lec3 10 UCB Spring 2004 Dave Patterson www cs berkeley edu patterson CS 152 L05 Timing Xilinx Design Notebook What is the smallest T that produces correct operation 1 28 04 Clk 012 34 5 Worst case CL delay limits T Fig 2 Microprocessor pipeline organization UC Regents Fall 2004 UCB CS 152 L05 Timing Xilinx Design Notebook UC Regents 2004 UCB Critical path the slowest path between anyFalltwo storage All storage elements are clocked by the same clock 1 2 devices III ARCHITECTURE edge shown in Fig 2 where the state boundaries are indicated by The microprocessor contains 32 kB instruction and data gray Features that allow the microarchitecture to achieve high Cycle time is a function of the critical path The combination logic blocks caches as well as an eight entry coalescing writeback buffer data cache fill tick buffers have two and four Inputs The areinstruction updatedand at each clock entries respectively The data cache supports hit under miss All outputs MUST be stable beforeto the clock tick operation and lines may be locked allownext SRAM like oper 1 28 04 ation Thirty two entry fully associative translation lookaside buffers TLBs that support multiple page sizes are provided for both caches TLB entries may also be locked A 128 entry branch target buffer improves performance a pipeline UCB branch Spring 2004 deeper than earlier high performance ARM designs 2 3 A Pipeline Organization To obtain high performance the microprocessor core utilizes a simple scalar pipeline and a high frequency clock In addition to avoiding the potential power waste of a superscalar approach functional design and validation complexity is decreased at the expense of circuit design effort To avoid circuit design issues the pipeline partitioning balances the workload and ensures that no one pipeline stage is tight The main integer pipeline is seven stages memory operations follow an eight stage pipeline and when operating in thumb mode an extra pipe stage is inserted after the last fetch stage to convert thumb instructions into ARM instructions Since thumb mode instructions 11 are 16 b two instructions are fetched in parallel while executing thumb instructions A simplified diagram of the processor pipeline is Today s Lecture More Project Topics Clocked logic timing wrap up speed are as follows The shifter and ALU reside in separate stages The ARM must bein greater than struction set allows a shift followed by an ALU operation in a Clock to Q Longest Path through Combination single instruction Previous implementations limited frequency by having the shift and ALU in a single stage Splitting this operation reduces the critical ALU bypass path by approximately 1 3 The extra pipeline hazard introduced when an instruction is CS152 Kubiatowicz 1 28 04 UCB Spring 2004 immediately followed by one requiring that the result be shifted Lec3 11 is infrequent Decoupled Instruction Fetch A two instruction deep queue is implemented between the second fetch and instruction decode pipe stages This allows stalls generated later in the pipe to be deferred by one or more cycles in the earlier pipe stages thereby allowing instruction fetches to proceed when the pipe is stalled and also relieves stall speed paths in the instruction fetch and branch prediction units Deferred register dependency stalls While register dependencies are checked in the RF stage stalls due to these hazards are deferred until the X1 stage All the necessary operands are then captured from result forwarding busses as the results are returned to the register file One of the major goals of the design was to minimize the energy consumed to complete a given task Conventional wisdom has been that shorter pipelines are more efficient due to re Logic Setup CS152 Kubiatowicz Lec3 12 Clocked Logic Timing The process was raised from 1 to limit standby power Circuit design and architectural pipelining ensure low voltage performance and functionality To further limit standby current in handheld ASSPs a longer poly target takes advantage of the versus dependence and source to body bias is used to electrically limit transistor in standby mode All core IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2 Delay Va Vout Vout A B General C L Cell Delay Model Design Notebook Delay Va Vout Vout A B Combinational Logic Cell X X X Cout X X Field Programmable Gate Arrays X Internal Delay X delay per unit load Ccritical Cout Combinational Cell symbol is fully specified by functional input output behavior CS 152 L05 Timing Xilinx Design Notebook truth table logic equation VHDL CS 152 L05 Timing Xilinx Design Notebook Input UC Regents Fall 2004 UCB 3 load factor of each input UC Regents Fall

View Full Document

Berkeley COMPSCI 152 - Lecture 5 – Timing, Xilinx

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

Berkeley COMPSCI 152 - Lecture 5 – Timing, Xilinx

Sign up for free to view:

Please select your school