DOC PREVIEW
Berkeley COMPSCI 150 - Lecture 18 - Circuit Timing

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Spring 2010EECS150 - Lec18-timing(2)Page EECS150 - Digital DesignLecture 18 - Circuit Timing (2)March 17, 2010John Wawrzynek1Spring 2010EECS150 - Lec18-timing(2)Page In General ...T ≥ τclk→Q + τCL + τsetup2For correct operation:for all paths.• How do we enumerate all paths?– Any circuit input or register output to any register input or circuit output?• Note: – “setup time” for outputs is a function of what it connects to.– “clk-to-q” for circuit inputs depends on where it comes from.Spring 2010EECS150 - Lec18-timing(2)Page Gate Delay is the Result of Cascading• Cascaded gates:“transfer curve” for inverter.3Spring 2010EECS150 - Lec18-timing(2)Page Delay in Flip-flops•Setup time results from delay through first latch.•Clock to Q delay results from delay through second latch.clkclk’clkclk’clkclk’clkclk’4Spring 2010EECS150 - Lec18-timing(2)Page Wire Delay• Even in those cases where the transmission line effect is negligible:– Wires posses distributed resistance and capacitance– Time constant associated with distributed RC is proportional to the square of the length• For short wires on ICs, resistance is insignificant (relative to effective R of transistors), but C is important.– Typically around half of C of gate load is in the wires.• For long wires on ICs:– busses, clock lines, global control signal, etc.– Resistance is significant, therefore distributed RC effect dominates.– signals are typically “rebuffered” to reduce delay:v1 v2 v3 v45v1v4v3v2timeSpring 2010EECS150 - Lec18-timing(2)Page Delay and “Fan-out”• The delay of a gate is proportional to its output capacitance. Connecting the output of gate one increases it’s output capacitance. Therefore, it takes increasingly longer for the output of a gate to reach the switching threshold of the gates it drives as we add more output connections.• Driving wires also contributes to fan-out delay.• What can be done to remedy this problem in large fan-out situations?1326Spring 2010EECS150 - Lec18-timing(2)Page “Critical” Path• Critical Path: the path in the entire design with the maximum delay.– This could be from state element to state element, or from input to state element, or state element to output, or from input to output (unregistered paths).• For example, what is the critical path in this circuit?• Why do we care about the critical path?7Spring 2010EECS150 - Lec18-timing(2)Page Searching for processor critical path1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001Fig. 1. Process SEM cross section.The process was raised from [1] to limit standby power.Circuit design and architectural pipelining ensure low voltageperformance and functionality. To further limit standby currentin handheld ASSPs, a longer poly target takes advantage of theversus dependence and source-to-body bias is usedto electrically limit transistor in standby mode. All corenMOS and pMOS transistors utilize separate source and bulkconnections to support this. The process includes cobalt disili-cide gates and diffusions. Low source and drain capacitance, aswell as 3-nm gate-oxide thickness, allow high performance andlow-voltage operation.III. ARCHITECTUREThe microprocessor contains 32-kB instruction and datacaches as well as an eight-entry coalescing writeback buffer.The instruction and data cache fill buffers have two and fourentries, respectively. The data cache supports hit-under-missoperation and lines may be locked to allow SRAM-like oper-ation. Thirty-two-entry fully associative translation lookasidebuffers (TLBs) that support multiple page sizes are providedfor both caches. TLB entries may also be locked. A 128-entrybranch target buffer improves branch performance a pipelinedeeper than earlier high-performance ARM designs [2], [3].A. Pipeline OrganizationTo obtain high performance, the microprocessor core utilizesa simple scalar pipeline and a high-frequency clock. In additionto avoiding the potential power waste of a superscalar approach,functional design and validation complexity is decreased at theexpense of circuit design effort. To avoid circuit design issues,the pipeline partitioning balances the workload and ensures thatno one pipeline stage is tight. The main integer pipeline is sevenstages, memory operations follow an eight-stage pipeline, andwhen operating in thumb mode an extra pipe stage is insertedafter the last fetch stage to convert thumb instructions into ARMinstructions. Since thumb mode instructions [11] are 16 b, twoinstructions are fetched in parallel while executing thumb in-structions. A simplified diagram of the processor pipeline isFig. 2. Microprocessor pipeline organization.shown in Fig. 2, where the state boundaries are indicated bygray. Features that allow the microarchitecture to achieve highspeed are as follows.The shifter and ALU reside in separate stages. The ARM in-struction set allows a shift followed by an ALU operation in asingle instruction. Previous implementations limited frequencyby having the shift and ALU in a single stage. Splitting this op-eration reduces the critical ALU bypass path by approximately1/3. The extra pipeline hazard introduced when an instruction isimmediately followed by one requiring that the result be shiftedis infrequent.Decoupled Instruction Fetch. A two-instruction deep queue isimplemented between the second fetch and instruction decodepipe stages. This allows stalls generated later in the pipe to bedeferred by one or more cycles in the earlier pipe stages, therebyallowing instruction fetches to proceed when the pipe is stalled,and also relieves stall speed paths in the instruction fetch andbranch prediction units.Deferred register dependency stalls. While register depen-dencies are checked in the RF stage, stalls due to these hazardsare deferred until the X1 stage. All the necessary operands arethen captured from result-forwarding busses as the results arereturned to the register file.One of the major goals of the design was to minimize the en-ergy consumed to complete a given task. Conventional wisdomhas been that shorter pipelines are more efficient due to re-Must consider all connected register pairs, paths from input to register, register to output. Don’t forget the controller.?8• Design tools help in the search. – Synthesis tools report delays on paths, – Special static timing analyzers accept a design netlist and report path delays, – and, of course, simulators


View Full Document

Berkeley COMPSCI 150 - Lecture 18 - Circuit Timing

Documents in this Course
Lab 2

Lab 2

9 pages

Debugging

Debugging

28 pages

Lab 1

Lab 1

15 pages

Memory

Memory

13 pages

Lecture 7

Lecture 7

11 pages

SPDIF

SPDIF

18 pages

Memory

Memory

27 pages

Exam III

Exam III

15 pages

Quiz

Quiz

6 pages

Problem

Problem

3 pages

Memory

Memory

26 pages

Lab 1

Lab 1

9 pages

Memory

Memory

5 pages

Load more
Download Lecture 18 - Circuit Timing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 18 - Circuit Timing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 18 - Circuit Timing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?