6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 1. © .UVWH $VDQRYLü.UVWH $ VD QRYLüComputer Architecture GroupMIT Laboratory for Computer [email protected]://www.cag.lcs.mit.edu/6.893-f2000/VLSI for Architects6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 2. © .UVWH $VDQRYLüFuture Computing InfrastructureMegaWattData CentersWireless InternetInternetPDAs, Cameras,Cellphones,Laptops, GPS,Set-tops,0.1-10 Watt ClientsBase StationsRoutersµWatt Wireless Sensor Networks6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 3. © .UVWH $VDQRYLüSemiconductor TrendsnNon-Recurring Engineering (NRE) costs are increasing rapidly for new designso>$1M for masks to spin a new designoEngineers cost ~$200K/year (salary+benefits+overhead)oPentium Pro design verification took around 350 engineer years or ~$70M=> Tremendous economies of scale(Can’t sell <1,000,000 parts for <$100 each)nCMOS following Moore’s Law until (at least) 2011-2014oITRS’99*roadmap 2011, 50nm technologyl64 Gb DRAMs (8 GB/chip)l7 billion transistor CPUsl10 GHz clocks (100 ps cycle time)=> Smallest viable chips have huge capacity(~10 million transistors/mm2,10 million transistors per person per day)[*International Technology Roadmap for Semiconductors]6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 4. © .UVWH $VDQRYLüProgrammable Silicon Replaces Custom HardwarePosition Sensor/ AccelerometerStereo VideoStereo Audio I/OOther Sensors/EffectorsProgrammable SiliconDisplay + TouchscreenUniversal WirelessFlash StorageProgrammable silicon replaces ASICs, or collections of DSPs, microprocessors and glue logicDRAM6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 5. © .UVWH $VDQRYLüBenchmarks & MetricsnApplication space wider than desktop processorsoBenchmark as many applications as possibleoInclude apps done with special hardware now (graphics, audio, crypto)oWhole system measuresoReal-time importantnPrimary metricsoCost (related to die area but also whole system cost)oExecution Time (latency and throughput, average and worst-case)oEnergy (also peak power and peak switching current)nCompare against best possible solution for each applicationoHow much worse than application-specific circuitry?oMoore’s law perhaps makes area the most forgiving dimensionltry to keep energy and delay competitive, possibly at expense of area6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 6. © .UVWH $VDQRYLüVLSI for ArchitectsTwo types of question architects ask:nHow will this change affect area/delay/energy in current technology?nHow will this design scale to future technologies?nFor next 10-15 years, thetechnology is CMOS6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 7. © .UVWH $VDQRYLüTransistorsGateSourceDrainBulkWidthLengthGateSourceDraina) Circuit Symbolc) Layout ViewMinimum Length=2λWidth=4λSource DrainGateGateDrainSourceCdrainCsourceRonCgated) Simple RC Modelb) Physical Realization6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 8. © .UVWH $VDQRYLüTransistorsIBM SOI Technology©IBM6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 9. © .UVWH $VDQRYLüMethod of Logical Effort(Sutherland and Sproul)nEasy way to estimate delays in CMOS processnIndicates correct number of logic stages to use and transistor sizesnCharacterize process speed with single delay parameter: ττ, delay of inverter driving same-sized inverter (no parasitics)τ in range 10-15ps for 0.18µm processes6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 10. © .UVWH $VDQRYLüGate Delay ComponentsnSplit delay of logic gate into three componentsDelay = Logical Effort x Electrical Effort + Parasitic DelaynLogical EffortoComplexity of logic function (Invert, NAND, NOR, etc)oDefine inverter has logical effort = 1oDepends only on topology not transistor sizingnElectrical EffortoRatio of output capacitance to input capacitance Cout/CinnParasitic DelayoIntrinsic self-loading of gateoIndependent of transistor sizes and output loadLogic GateCinCout6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 11. © .UVWH $VDQRYLüLogical Effort for Simple GatesnDefine Logical Effort of Inverter = 1nFor other gates, size to give same current drive as inverternLogical Effort is ratio of logic gate’s input cap. to inverter’sinput cap.21Relative Transistor Widths22221441InverterInput Cap = 3 unitsL.E.=1 (definition)NANDInput Cap = 4 unitsL.E.=4/3NORInput Cap = 5 unitsL.E.=5/36.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 12. © .UVWH $VDQRYLüElectrical EffortnRatio of output load capacitance over input capacitance:E.E. = Cout/CinnUsually, transistors have minimum lengthnInput and output capacitances can be measured in units of transistor gate widthsLogic GateCinCout6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 13. © .UVWH $VDQRYLüParasitic DelaynMain cause is drain capacitancesnThese scale with transistor width so P.D. independent of transistor sizesnUseful approximation:Cgate~= CdrainnFor inverter:Parasitic Delay ~= 1.0 τCdrainNRonNCgateNCdrainPRonPCgateP6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 14. © .UVWH $VDQRYLüInverter Chain DelaynFor each stage:Delay = Logical Effort x Electrical Effort + Parasitic Delay= 1.0 (definition)x 1.0 (in = out)+ 1.0 (drain caps)= 2.0 units6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 15. © .UVWH $VDQRYLüOptimizing Circuit PathsnPath logical effort, G = Π gi(gi= L.E. stage i)nPath electrical effort, H = Cout/Cin (hi= E.E. stage i)nParasitic delay, P = Σ pi(pi= P.D. stage i)nPath effort, F = GHnMinimum delay when each of N stages has equal effortMin. D = NF1/N + Pi.e. gi hi =F1/NCoutCin6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 16. © .UVWH $VDQRYLüOptimal Number of StagesnMinimum delay when:stage effort = logical effort x electrical effort ~= 3.4-3.8oSome derivations have e= 2.718.. as best stage effort – this ignores parasiticsoBroad optimum, stage efforts of 2.4-6.0 within 15-20% of minimumnFan-out-of-four (FO4) is convenient design size (~5τ)CinCoutFO4 delay: Delay of inverter driving four
View Full Document