USC EE 577a - lect.12 - D2899666

Home> Schools> University of Southern California> Electrical Engineering (EE) > EE 577a> lect.12

DOC PREVIEW

USC EE 577a - lect.12

School name University of Southern California

Course Ee 577a- VLSI System Design

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture 12 Skew Tolerant Domino Clocking Computer Systems Laboratory Stanford University horowitz stanford edu Copyright 2000 Mark Horowitz Original Slides from David Harris EE371 Lecture 12 1 Horowitz Introduction Domino Circuits are becoming ubiquitous in high speed digital ICs Offer 30 or more speedup over static CMOS raw gate delay Dual rail domino becoming more common because many functions are nonmonotonic area is less of an issue Nevertheless traditional domino pipelines have significant overhead Latch required to hold result while next stage evals prev precharges Skew budget no time borrowing latch delay Look at several ways to reduce this overhead Better latches Self timing Skew tolerant domino is a powerful new technique Evaluate performance benefits of skew tolerant domino EE371 Lecture 12 2 Horowitz Domino from a System Perspective Domino doesn t look so attractive in the context of a traditional pipeline clk clk b clk b Domino Latch Static Domino clk b clk b Static Domino Latch clk b clk Domino Static Domino Static Domino clk clk clk Legend Domino One inverting dynamic gate Static One inverting static gate Latch Inverting tristate latch 1 Pay clock skew twice each phase 2 Balancing short phases is hard since there is no time borrowing 3 Latches become a significant fraction of the cycle time EE371 Lecture 12 3 Horowitz Traditional Domino Performance Evaluation Let T cycle time 20 FO4 delays tskew 2 tsetup 11 Difficult filling cycle exactly no time borrowing timbalance 1 Tphase logic T 2 tskew tsetup timbalance Baseline Design Tphase logic 20 2 2 1 1 6 40 of the phase is wasted in overhead Slower than static Optimized Design Define clock domains and use tskew local 1 Work hard to balance logic between phases timbalance 0 optimistic Tphase logic 20 2 1 1 0 8 Still 20 of the phase is overhead 1 Remember for this situation the setup time must be large enough that the output has settled before clock arrives since the output might go into a dynamics gate on the next cycle and might not be monotonic EE371 Lecture 12 4 Horowitz Early Enhancements Good designers have recognized this problem for years The largest problem is the hard edges set by the latches A variety of latches soften this edge Gate outputs are already q1 so why use another clock An SR latch will work instead Use the monotonic nature of the signal to feed it into a precharged latch stage from domino SR Latch Dual Monotonic Latch TSPC Latch Still have a problem if you want to use non monotonic logic somewhere since logic must settle before earliest clock while gate might not evaluate unitl a late clock But if you only have monotonic gates EE371 Lecture 12 5 Horowitz Skew Tolerant Domino Clocking If inputs are all dual rail then as long as the clock arrives before the data The gate will wait and fire when the data arrives If the next gate fires before the current gate precharges There is no need for a latch Like the self timed pipeline Can generate these properties using overlapping clocks EE371 Lecture 12 6 Horowitz Skew Tolerant Domino Circuits How much clock skew could we tolerate given N clock phases Divide logic into N phases of T N duration each Overlapping clocks eliminates need for latches Extra overlap accommodates clock skew and time borrowing 1 2 1 1 2 2 2 2 Static Domino Static Domino Static Domino Static Domino Static Domino Static Domino Static Domino Static Domino 1 1 As with other domino techniques budget skew on the transition from static to domino EE371 Lecture 12 7 Horowitz Skew Tolerance T te t p tp tprech tskew te T N tskew thold Hence tskew max T N 1 N tprech thold 2 1 2 1a 1b te tp must overlap by thold Domino Static Domino Lecture 12 8 Static 1b Static EE371 1a Effective Precharge Window 2a Domino 2a Horowitz Numerical Example Let tprech 4 long enough to precharge domino gate make subsequent skewed static fall below Vt thold is slightly negative for reasonable cell libraries next phase can evaluate before precharge ripples through static gate conservatively bound thold at 0 N 2 3 4 6 8 tskew tp 2 6 3 33 7 33 4 8 4 66 8 66 5 9 Sweet spots N 2 fewest clocks N 4 good tolerance 50 duty cycle EE371 Lecture 12 9 Horowitz Global Local Skew This is good but we can do better Local skew can be more tightly controlled than global skew 1 FO4 Require that each phase of logic fit in a local clock domain tp tprech tskew local te T N tskew global thold Hence tskew global max T N 1 N tskew local tprech thold When tskew global gets huge precharge interferes with subsequent phase N tskewglobal 2 3 4 6 8 EE371 3 5 66 6 6 6 tp 5 5 6 7 33 8 Lecture 12 10 Horowitz Time Borrowing We don t need such a large global skew tolerance Use some of this time instead to allow time borrowing tborrow T N 1 N tskew global tskew local tprech thold Intentional borrowing helps balance logic between phases Opportunistic time borrowing compensates for uncertainties in models analysis tools and processing If actual tskew global 2 tskew local 1 N tborro w 2 3 4 6 8 EE371 1 3 66 5 6 33 7 tp 5 5 5 5 5 Lecture 12 11 Horowitz Other Design Issues State is no longer stored in the latch at the end of a phase Instead it is held by the first domino gate in the phase Use a full keeper to allow stop clock operation weak from 1 block 2 All systems with overlapping clocks require min delay checks Domino paths are presumably critical anyway so few min delay errors 4 phase has effectively no min delay risk Overlap of all four phases is at most very small A minimum of 8 gates are in the cycle anyway EE371 Lecture 12 12 Horowitz Skew Tolerant Performance Evaluation Evaluate ALU self bypass of superscalar proc like DEC Alpha 3 metal 0 6 m process FO4 delay in TT corner 138 ps Compare traditional domino to 4 phase skew tolerant domino Latch Other ALU blocks 150 fF 4 3 3 2 1 Domino Static Domino Static Domino 64 bit Adder 1 mm 2 mm 1 mm 1 mm 2 mm x4 Result Mux Skew Tolerant EE371 Bypass Mux To Data Cache Traditional x2 Add Sub clk b Latch Domino Static Domino Static Domino Result Mux 64 bit Adder clk b clk b clk clk clk clk x2 Add Sub x4 1 mm Lecture 12 13 Bypass Mux To Data Cache Other ALU blocks 150 fF Horowitz Simulation Results No Skew Traditional Domino Latency 13 0 FO4 cycle time 16 6 Cycles are unbalanced no time borrowing available Skew Tolerant Domino Latency 11 9 FO4 cycle time 11 9 Remove latches from critical path balance pipe stages 1 FO4 local skew Traditional Domino Latency 15 0 FO4 cycle time 17 6 Skew adds to both phases for latency

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

USC EE 577a - lect.12

Sign up for free to view:

Please select your school