DOC PREVIEW
Berkeley COMPSCI 250 - Circuit Timing

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1CS250 VLSI Systems DesignCircuit TimingFall 2009John Wawrzynek, Krste Asanovic’, with John LazzaroCS250, UC Berkeley Fall ‘09Lecture 04, TimingCircuit Delay is a Consequence of the Physics of Transistors and Interconnections.‣As a designer, you need to understand these physics enough to make appropriate design decisions.‣Fortunately for us, CMOS can be accurately modeled in most cases as simple resistive/capacitive circuits:‣Circuit timing is part of the larger hierarchy of design decisions regarding performance.2Charging/discharging rate proportional to R X CCS250, UC Berkeley Fall ‘09Lecture 04, TimingPerformance Design Decisions3Abstraction LayerExample Choicesfunctional specificationalgorithm or ISAmicroarchitecturefunction unit multiplexing, pipeliningRTLlogic organization (factoring)transistor circuitstransistor sizing, signal bufferinglayoutwire lengths, layer assignmentdevice & wire engineeringmaterials, processing‣Ultimate goal is to meet performance / cost (area) / power target for the functional specification. ‣Subgoal is meet a upper bound on clock period.CS250, UC Berkeley Fall ‘09Lecture 04, TimingSynchronous Design Clock Constraint41. Delay in Combinational Logic2. Delay in State ElementsT ≥ τclk→Q + τCL + τsetup + τclk-skewFor all pathsToday: focus on transistor circuit and layout level. Micro-architecture/RTL later, device/wire engineering out of our control.3. Delay in wires (grouped with CL or State)4. Clock SkewCS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Parasitics5CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Resistance6CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1ResistanceValues745nm process:inner metal layers: ~0.09 Ohms/sqouter metal layers: ~0.028 Ohm/sqvias: ~0.9 OhmCS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Transistor Resistor Approximation8CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Resistive Effects9CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Parasitic Capacitance10CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Capacitance between layers11CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Capacitance of Diffusion Regions12CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Transistor source/drain regions13CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Typical Capacitance Values14CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Gate Capacitance Calculation15In a 45nm process, a unit inverter delay is 8-10ps. CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Wire Coupling Capacitance16CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Node Coupling Effect17CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Combining R & C18Same effect applies to series connections of transistors.CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Combined R & C 190.25 um process, minimal width wiresCS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Driving RC lines20CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Rise & Falls times and propagation delay21n0n1CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Timing Optimization22In a 45nm process, τis 4-5ps.CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 123CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 124CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Driving Large Capacitive Loads25CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 126CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 127CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 128CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 129CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1“Rebuffer” Long Wires30v1 v2 v3 v4v1v4v3v2timeWire Delay = 1/2⋅Rtotal X Ctotal = 1/2⋅R⋅ C⋅ L2RCWire Delay = 2⋅1/2⋅(1/2 Rtotal X 1/2 Ctotal ) = 1/4⋅Rtotal X Ctotal = 1/4⋅R⋅ C⋅ L2Buffer adds some delay. With too many splits, buffer delay dominates.CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Layout of Large Transistors31Modern designs rules don’t allow.CS250, UC Berkeley Fall ‘09Lecture 01, Introduction 1Layout of Large Transistors32Modern designs rules don’t allow.CS250, UC Berkeley Fall ‘09Lecture 04, TimingLayout of Three Stage Buffer33Timing Closure: Searching for and beating down the critical path1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001Fig. 1. Process SEM cross section.The process was raised from [1] to limit standby power.Circuit design and architectural pipelining ensure low voltageperformance and functionality. To further limit standby currentin handheld ASSPs, a longer poly target takes advantage of theversus dependence and source-to-body bias is usedto electrically limit transistor in standby mode. All corenMOS and pMOS transistors utilize separate source and bulkconnections to support this. The process includes cobalt disili-cide gates and diffusions. Low source and drain capacitance, aswell as 3-nm gate-oxide thickness, allow high performance andlow-voltage operation.III. ARCHITECTUREThe microprocessor contains 32-kB instruction and datacaches as well as an eight-entry coalescing writeback buffer.The instruction and data cache fill buffers have two and fourentries, respectively. The data cache supports hit-under-missoperation and lines may be locked to allow SRAM-like oper-ation. Thirty-two-entry fully associative translation lookasidebuffers (TLBs) that support multiple page sizes are providedfor both caches. TLB entries may also be locked. A 128-entrybranch target buffer improves branch performance a pipelinedeeper than earlier high-performance ARM designs [2], [3].A. Pipeline OrganizationTo obtain high performance, the microprocessor core utilizesa simple scalar pipeline and a high-frequency clock. In additionto avoiding the potential power waste of a superscalar approach,functional design and validation complexity is decreased at theexpense of circuit design effort. To avoid circuit design issues,the pipeline partitioning balances the workload and ensures thatno one pipeline stage is tight. The main integer pipeline is sevenstages, memory operations follow an eight-stage pipeline, andwhen operating in thumb mode an extra pipe stage is insertedafter the last fetch stage to convert thumb instructions into ARMinstructions. Since thumb mode instructions [11] are 16 b, twoinstructions are fetched in parallel while executing thumb


View Full Document

Berkeley COMPSCI 250 - Circuit Timing

Download Circuit Timing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Circuit Timing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Circuit Timing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?