1EE241 - Spring 2005Advanced Digital Integrated CircuitsLecture 21:Asynchronous DesignSynchronizationClock Distribution2Self-Timed Pipelined DatapathHSReqAckIn OutStart DonetpF1HSReqAckHSReqAckReqAckR1 F1Start DonetpF2R2 F2Start DonetpF3R3 F323Hand-Shaking Protocol113RECEIVERSENDERReqReqAckDataAckData(a) Sender-receiver configuration(b) Timing diagramcycle 1 cycle 2Sender’s actionReceiver’s action2Two Phase Handshake4Event Logic – The Muller-C ElementAB Fn+10011010(b) Truth table(a) Schematic10FnFn1FABCSFFRQAB(a) Logic(b) Majority Function(c) DynamicABBBAVDDBFABVDDVDD352-Phase Handshake ProtocolAdvantage : FAST - minimal # of signaling events (important for global interconnect)Disadvantage : edge - sensitive, has stateSenderlogicReceiverlogicDataHandshake logicData ready Data acceptedCReqAck6Example: Self-timed FIFOAll 1s or 0s -> pipeline emptyAlternating 1s and 0s -> pipeline fullCCR1InOutEnAckiReqiR2R3CReq0AckoDone472-Phase Protocol8ExampleFrom [Horowitz]59Example10Example611Example124-Phase Handshake ProtocolSlower, but unambiguousAlso known as RTZ1 123 54ReqAckDataCycle 1 Cycle 2Sender’s actionReceiver’s action7134-Phase Handshake ProtocolImplementation using Muller-C elementsHandshake logicData readyData acceptedReqSAckC CSenderlogicReceiverlogicData14Self-Resetting LogicPrechargedLogic Block(L1)PrechargedLogic Block(L2)PrechargedLogic Block(L3)completiondetection(L1)completiondetection(L2)completiondetection(L3)VDDAB CintoutPost-chargelogic815Asynchronous-Synchronous InterfaceAsynchronoussystemSynchronous systemSynchronizationfCLKfin16Synchronizers and ArbitersArbiter: Circuit to decide which of 2 events occurred firstSynchronizer: Arbiter with clock φ as one of the inputsProblem: Circuit HAS to make a decision in limited time - which decision is not importantCaveat: It is impossible to ensure correct operationBut, we can decrease the error probability at the expense of delay917A Simple Synchronizer • Data sampled on rising edge of the clock• Latch will eventually resolve the signal value,but ... this might take infinite time!CLKintI2I1DQCLK18Synchronizer: Output Trajectories Single-pole model for a flip-flop2.01.00.00 100 200 300Vouttime [ps]1019Mean Time to Failure20ExampleTf = 10 nsec = TTsignal = 50 nsectr = 1 nsect = 310 psecVIH - VIL = 1 V (VDD = 5 V)N(T) = 3.9 10-9 errors/secMTF (T) = 2.6 108 sec = 8.3 yearsMTF (0) = 2.5 µsec1121Influence of NoiseInitial Distributionp(v)0VILVIHTUniform distributionaround VM Still Uniformlogarithmic reductionLow amplitude noise does not influence synchronization behavior22Typical Synchronizersφ1φ2QQφ1φ2Using delay line2 phase clocking circuit1223Cascaded Synchronizers Reduce MTFSync Sync SyncInO1O2Outφ24ArbitersReq1Req 2Req1Req2Ack1Ack2ArbiterAck1Ack2(a) Schematic symbol(b) ImplementationABReq1Req2ABAck1t(c) Timing diagramVT gapmetastable1325PLL-Based SynchronizationDigitalSystemDividerCrystalOscillatorPLLChip 1DigitalSystemPLLChip 2fsystem = N x fcrystalfcrystal, 200<MhzDataClockBufferreferenceclock26Clock DistributionGoal: Minimization of uncertaintyClock skew (spatial uncertainty)SystematicClock jitter (temporal uncertainty)Random cycle-to-cycle changes1427ReadingChapter 13, (Chandrakasan et al), Clock Distribution by BaileyChapter 12, (Chandrakasan et al), PLLs and DLLs by ManeatisChapter 10, Rabaey et al.28Clock DistributionTree Common, e.g. IBM S/390Clock grid » DEC AlphaLength-matched Serpentines » Intel P61529Clock DistributionCLOCKH-Tree NetworkObserve: Only Relative Skew is ImportantExample:PowerPC 603Gerosa, JSSC 12/9430Clock Network with Distributed BufferingModuleModuleModuleModuleModuleModuleCLOCKmain clock driversecondary clock driversReduces absolute delay, and makes Power-Down easierSensitive to variations in Buffer DelayLocal Area1631PredriverBinary treeH - treeX - treeArbitrary matched tree32Example IBM S/390Clock skewWebb, JSSC 11/971733Clock Tree DelaysRestle, VLSI’9834Impact of clock network sizing1835Impact of clock network sizing36Final Stage: Tree vs. GridCourtesy of IEEE Press, New York. © 2000RC-matched TreeGrid1937IR Emission ImagesCentralbufferClockrepeatersSectorbuffersLocalclocksSanda, ISSCC’9938Example: DEC Alpha 21164Clock Drivers2039Clock Skew in Alpha Processor40DEC Alpha EvolutionClock driver placements21064 21164 21164Gronowski, JSSC 5/982141Clock Skews21064211642126442Hybrid GridDEC Alpha 21264Bailey JSSC 11/982243Alpha 2126444Alpha 21264 GridsGlobal clockMajor clock grids2345Data-Dependent Gate Loading46Multi-GHz Clock Networkshttp://www.research.ibm.com/people/r/restle/MGHz.htmlhttp://www.research.ibm.com/people/r/restle/Animations/DAC01top.htmlPhillip Restle, IBM ResearchIEEE SSCTC Workshop on Design for Multi-GigaHertz Processors, San Fransico, Feb. 7, 20002447Clock GenerationPhaseDetChargePumpFilterDLPD CP VCO÷NDelay-Locked Loop (Delay Line Based)Phase-Locked Loop (VCO-Based)UDUDfREFfOfOfREFFilter48Phase-Locked Loop Based Clock GeneratorPhasedetectorChargepumpUpDownLoopfilterVCOClock decode &bufferDivide byNReference clockLocalclockφ1φ2...VcontrActs also as Clock MultiplierUpDown2549Loop ComponentsPhase ComparatorProduces UP/DN pulses corresponding to phase differenceCharge PumpSources/sinks current for duration of UP/DN pulsesLoop FilterIntegrates current to produce control voltageVoltage-Controlled Delay Line Changes delay proportionally to voltageVoltage-Controlled OscillatorGenerates frequency proportional to control voltage50PLL Jitter2651DLL LockingCourtesy of IEEE Press, New York. © 200052Clock DeskewingGeannopoulos, ISSCC’98Two clock spines, two DLLs, and a PD that controls them2753Clock RingShibayama, ISSCC’98Clocks routed in parallel,opposite directionsLCG aligns to the middle54Synchronous Distributed OscillatorsMizuno, ISSCC’98VCOs# of nearest neighbors2855Distributed PLLsGutnik, ISSCC’200056Intel ItaniumTMRusu,ISSCC’20002957Intel
View Full Document