Clock and PowerDigital System Timing ConventionsLarge SystemsClocked Storage ElementsFlip-Flop Timing ParametersEdge-Triggered Timing ConstraintsClock DistributionClock Skew: Spatial Clock VariationClock Jitter: Temporal Clock VariationHow do clock skew and jitter arise?Clock Distribution with Clock Grids Low skew but high powerClock Distribution with Clock Trees More skew but less powerClock Distribution Example Active deskewing circuits in Intel ItaniumReducing Clock Distribution ProblemsClock Tree Synthesis for ASICsExample of clock tree synthesis using commercial ASIC back-end toolsSlide 17Power has been increasing rapidlyPower Dissipation ProblemsSimple RC model can also yield intuition on energy consumption of inverterMany other types of power consumption in addition to dynamic powerDynamic and Static powerReducing Dynamic Power (1)Reducing Dynamic Power (2)Reducing Static PowerReducing activity with clock gatingReducing activity with data gatingVoltage Scaling to trade Energy for DelayParallelism Reduces EnergyVoltage Scaling ExampleReducing Power in ASIC Designs (1)Reducing Power in ASIC Designs (2)Power DistributionPower Distribution Possible IR drop across power networkIR drop can be static or dynamicPower Distribution: Custom Approach Carefully tailor power networkPower Distribution: ASIC Approach Strapping and rings for standard cellsPower Distribution: ASIC Approach Power rings partition the power problemExample of power distribution network using commercial ASIC back-end toolsSlide 40Clock and Power6.375 Complex Digital SystemsKrste AsanovicMarch 7, 20076.375 Spring 2007 • L12 Clock and Power • 2Digital System Timing Conventions•All digital systems need a convention about when a receiver can sample an incoming data value–synchronous systems use a common clock–asynchronous systems encode “data ready” signals alongside, or encoded within, data signals•Also need convention for when it’s safe to send another value–synchronous systems, on next clock edge (after hold time)–asynchronous systems, acknowledge signal from receiverDataClockDataReadyAcknowledgeSynchronous AsynchronousData DataReadyAck.6.375 Spring 2007 • L12 Clock and Power • 3Large SystemsMost large ASICs, and systems built with these ASICs, have several synchronous clock domains connected by asynchronous communication channelsChip AChip BChip CClock domain 1Clock domain 4Clock domain 2Clock domain 3Clock domain 5Clock domain 6Asynch. channel We’ll focus on a single synchronous clock domain in this class6.375 Spring 2007 • L12 Clock and Power • 4ClockD QClockDQTransparent LatchedClockD QClockDQD-Type Register or Flip-Flop, Edge-Triggered–data captured on rising edge of clock, held for rest of cycle(Can also have latch transparent on clock low, or negative-edge triggered flip-flop)Clocked Storage ElementsTransparent Latch, Level Sensitive–data passes through when clock high, latched when clock low6.375 Spring 2007 • L12 Clock and Power • 5Flip-Flop Timing Parameters•TCQmin/TCQmax–propagation of DQ at clock edge•Tsetup/Thold–define window around rising clock edge during which data must be steady to be sampled correctly–either setup or hold time can be negativeClockDQTCQmaxTCQminTsetupTholdOutput undefined6.375 Spring 2007 • L12 Clock and Power • 6Edge-Triggered Timing ConstraintsSingle clock with edge-triggered registers (common in stdcell ASICs)•Slow path timing constraintTcycle TCQmax + TPmax + Tsetup–can always work around slow path by using slower clock•Fast path timing constraintTCQmin + TPmin Thold–bad fast path cannot be fixed without redesign!–might have to add delay into paths to satisfy hold timeCLKCombinational LogicTPmin/TPmax6.375 Spring 2007 • L12 Clock and Power • 7Clock DistributionClockCannot really distribute clock instantaneouslywith a perfectly regular period6.375 Spring 2007 • L12 Clock and Power • 8Clock Skew: Spatial Clock VariationClock SkewDifference in clock arrival time at two spatially distinct pointsABABSkewCompressed timing path6.375 Spring 2007 • L12 Clock and Power • 9Clock Jitter: Temporal Clock VariationClock JitterDifference in clock period over timePeriod A Period BCompressed timing path6.375 Spring 2007 • L12 Clock and Power • 10How do clock skew and jitter arise?Central Clock DriverClock Distribution NetworkLocal Clock BuffersVariations in trace length, metal width and height, coupling capsVariations in local clock load, local power supply, local gate length and threshold, local temperature6.375 Spring 2007 • L12 Clock and Power • 11Clock Distribution with Clock GridsLow skew but high powerClock driver tree spans height of chipInternal levels shorted togetherGrid feeds flops directly, no local buffers6.375 Spring 2007 • L12 Clock and Power • 12Clock Distribution with Clock TreesMore skew but less powerRecursive pattern to distribute signals uniformly with equal delay over areaEach branch is individually routed to balance RC delayH-TreeRC-Tree6.375 Spring 2007 • L12 Clock and Power • 13Clock Distribution ExampleActive deskewing circuits in Intel ItaniumActive Deskew Circuits (cancels out systematic skew)Phase Locked Loop (PLL)Regional Grid6.375 Spring 2007 • L12 Clock and Power • 14Reducing Clock Distribution Problems•Use latch-based design–Time borrowing helps reduce impact of clock uncertainty–Timing analysis is more difficult–Rarely used in fully synthesized ASICs, but sometimes in datapaths of otherwise synthesized ASICs•Make logical partitioning match physical partitioning–Limits global communication where skew is usually the worst–Helps break distribution problem into smaller subproblems•Use globally asynchronous, locally synchronous design–Divides design into synchronous regions which communicate through asynchronous channels–Requires overhead for inter-domain communication•Use asynchronous design–Avoids clocks all together–Incurs its own forms of control overhead6.375 Spring 2007 • L12 Clock and Power • 15Clock Tree Synthesis for ASICs•Modern back-end tools include clock tree synthesis–Creates balanced RC-trees–Uses special clock buffer standard cells–Can add clock shielding–Can exploit useful clock skew•Automatic clock tree generation still results in significantly worse clock uncertainties as compare to hand-crafted custom clock trees–Modern high-performance processors have clock distribution with <10ps skew
View Full Document