9/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.1September 12, 2001John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/CS152Computer Architecture and EngineeringLecture 4Cost and Design9/12/01 ©UCB Fall 2001CS152 /Kubiatowicz Lec4.2YearPerformance0.111010010001965 1970 1975 1980 1985 1990 1995 2000MicroprocessorsMinicomputersMainframesSupercomputersReview: Performance and Technology Trends° Technology Power: 1.2 x 1.2 x 1.2 = 1.7 x / year• Feature Size: shrinks 10% / yr. => Switching speed improves 1.2 / yr.• Density: improves 1.2x / yr.• Die Area: 1.2x / yr.° RISC lesson is to keep the ISA as simple as possible:• Shorter design cycle => fully exploit the advancing technology (~3yr)• Advanced branch prediction and pipeline techniques• Bigger and more sophisticated on-chip caches9/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.3Review: Characterize a Gate° Input capacitance for each input° For each input-to-output path:• For each output transition type (H->L, L->H, H->Z, L->Z ... etc.)- Internal delay (ns)- Load dependent delay (ns / fF)° Example: 2-input NAND GateOutABFor A and B: Input Load (I.L.) = 61 fFFor either A -> Out or B -> Out:Tlh = 0.5ns Tlhf = 0.0021ns / fFThl = 0.1ns Thlf = 0.0020ns / fFDelay A -> OutOut: Low -> HighCout0.5nsSlope =0.0021ns / fF9/12/01 ©UCB Fall 2001CS152 /Kubiatowicz Lec4.4Review: General C/L Cell Delay Model° Combinational Cell (symbol) is fully specified by:• functional (input -> output) behavior- truth-table, logic equation, VHDL• load factor of each input• critical propagation delay from each input to each output for each transition- THL(A, o) = Fixed Internal Delay + Load-dependent-delay x load ° Linear model composesCoutVoutABX...CombinationalLogic CellCoutDelayVa -> VoutXXXXXXCcriticalInternal Delaydelay per unit load9/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.5Review: More complicated gates° Input Load: A = 61 fF, B = 61 fF, S = 111 fF° Load Dependent Delay:• TAYlhf = 0.0021 ns / fF TAYhlf = 0.0020 ns / fF• TBYlhf = 0.0021 ns / fF TBYhlf = 0.0020 ns / fF• TSYlhf = 0.0021 ns / fF TSYlhf = 0.0020 ns / f F° Internal Delay:• TAYlh = 0.844ns TBYlh = 0.844ns• Fun Exercises: TAYhl, TBYhl, TSYlh, TSYlh° How do we compute these numbers?ABYS2 x 1 Mux° Three Components:• Input Load• Load Dependent Delay• Internal Delays- One for each input pathooutput transition9/12/01 ©UCB Fall 2001CS152 /Kubiatowicz Lec4.62 to 1 MUX: Input Load and Load Dependent Delay° Input Load (I.L.)• A, B: I.L. (NAND) = 61 fF • S: I.L. (INV) + I.L. (NAND) = 50 fF + 61 fF = 111 fF° Load Dependent Delay (L.D.D.): Same as Gate 3• TAYlhf = 0.0021 ns / fF TAYhlf = 0.0020 ns / fF• TBYlhf = 0.0021 ns / fF TBYhlf = 0.0020 ns / fF• TSYlhf = 0.0021 ns / fF TSYlhf = 0.0020 ns / fFY = (A and !S) or (B and S)ABSGate 3Gate 2Gate 1Wire 1Wire 2Wire 0ABYS2 x 1 Mux9/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.72 to 1 MUX: Internal Delay Calculation° Internal Delay (I.D.):• A to Y: I.D. G1 + (Wire 1 C + G3 Input C) * L.D.D G1 + I.D. G3• B to Y: I.D. G2 + (Wire 2 C + G3 Input C) * L.D.D. G2 + I.D. G3• S to Y (Worst Case): I.D. Inv + (Wire 0 C + G1 Input C) * L.D.D. Inv + Internal Delay A to Y° We can approximate the effect of “Wire 1 C” by:• Assume Wire 1 has the same C as all the gate C attached to it.° Specific Example:• TAYlh = TPhl G1 + (2.0 * 61 fF) * TPhlf G1 + TPlh G3= 0.1ns + 122 fF * 0.0020 ns/fF + 0.5ns = 0.844 nsY = (A and !S) or (A and S)ABSGate 3Gate 2Gate 1Wire 1Wire 2Wire 09/12/01 ©UCB Fall 2001CS152 /Kubiatowicz Lec4.8CS152 Logic Elements° NAND2, NAND3, NAND 4° NOR2, NOR3, NOR4° INV1x (normal inverter)° INV4x (inverter with large output drive)° D flip flop with negative edge triggered° XOR2° XNOR2° PWR: Source of 1’s° GND: Source of 0’s° fast MUXes9/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.9Storage Element’s Timing Model° Setup Time: Input must be stable BEFORE the trigger clock edge° Hold Time: Input must REMAIN stable after the trigger clock edge° Clock-to-Q time:• Output cannot change instantaneously at the trigger clock edge• Similar to delay in logic gates, two components:- Internal Clock-to-Q- Load dependent Clock-to-Q° Typical for class: 1ns Setup, 0.5ns HoldDQD Don’t CareDon’t CareClkUnknownQSetupHoldClock-to-Q9/12/01 ©UCB Fall 2001CS152 /Kubiatowicz Lec4.10Clocking Methodology° All storage elements are clocked by the same clock edge° The combination logic block’s:• Inputs are updated at each clock tick• All outputs MUST be stable before the next clock tickClk............Combination Logic9/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.11Critical Path & Cycle Time° Critical path: the slowest path between any two storage devices° Cycle time is a function of the critical path° must be greater than:• Clock-to-Q + Longest Path through Combination Logic + SetupClk............9/12/01 ©UCB Fall 2001CS152 /Kubiatowicz Lec4.12Clock Skew’s Effect on Cycle Time° The worst case scenario for cycle time consideration:• The input register sees CLK1• The output register sees CLK2° Cycle Time - Clock Skew t CLK-to-Q + Longest Delay + Setup Cycle Time t CLK-to-Q + Longest Delay + Setup + Clock SkewClk1Clk2Clock Skew............Clk1 Clk29/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.13Tricks to Reduce Cycle Time° Reduce the number of gate levels° Use esoteric/dynamic timing methods° Pay attention to loading° One gate driving many gates is a bad idea° Avoid using a small gate to drive a long wire° Use multiple stages to drive large loadABCDABCDINV4xINV4xClarge9/12/01 ©UCB Fall 2001CS152 /Kubiatowicz Lec4.14How to Avoid Hold Time Violation?° Hold time requirement:• Input to register must NOT change immediately after the clock tick° This is usually easy to meet in the “edge trigger” clocking scheme ° Hold time of most FFs is <= 0 ns° CLK-to-Q + Shortest Delay Path must be greater than Hold TimeClk............Combination Logic9/12/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec4.15Clock Skew’s Effect on Hold Time° The worst case scenario for hold time consideration:• The input register sees CLK2• The output register sees CLK1• fast FF2 output must not change input to FF1 for same clock edge° (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold
View Full Document