DOC PREVIEW
Berkeley ELENG 241B - Optimization for Performance

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1EE241 - Spring 2005Advanced Digital Integrated CircuitsLecture 6:Optimization for Performance2AdminProject proposals due by Fr 5pm (by e-mail to Huifangand myself) TitleShort abstract of 10-15 lines describing the problem you are trying to addressSpecial office hours today right after class (3:30-4:30pm)Some feedback on ISSCC? What did catch your eye?23Today’s lectureUsing the models we have created so far to do create an environment for optimizationReading:ICCAD paper by Stojanovic et al.Chapters 2 and 3 in the text by K. Bernstein (High Speed CMOS Design Styles)Background material from Rabaey, 2nded, Chapters 5, 6.4Static Timing AnalysisComputing critical (longest) path delayLongest path algorithm on DAG [Kirkpatrick, IBM Jo. R&D, 1966]Used in most ASIC designs todayLimitationsFalse pathsSimultaneous arrival times35Signal Arrival TimesNAND gate:16Signal Arrival TimesNAND gate:147Simultaneous Arrival TimesNAND gate:8Impact of Arrival TimesABDelay0 tA- tBA arrives early B arrives earlyUp to 25%59Optimization for PerformancePerformance critical blocksStart with a synthesized designEasier to explore architecturesEasy to verifyProvides some level of performance optimizationUnderstand the limits of synthesized designs10Performance OptimizationPowerDelayIncreasing the performanceincreases power!611Performance OptimizationPowerDelayMicroarchitecture AMicroarchitecture B12Performance OptimizationPowerDelaySynthesizedMicroarchitecture AMicroarchitecture BCustomMicroarchitecture A713How to Increase Performance?Scale technologyCircuit level:Transistor sizing, bufferingWire optimization, repeatersSupply and Threshold voltageLogic stylesTiming, latchesMicroarchitectureBlock topologies (adders, multipliers)PipeliningParallelism14Sizing Logic Paths for SpeedFrequently, input capacitance of a logic path is constrainedLogic has to drive some capacitanceExample: ALU load in an Intel’s microprocessor is > 0.5pFHow do we size the ALU datapath to achieve maximum speed?Review the method of logical effort815Inverter ChainCLIf CLand CInare given:- How many stages are needed to minimize the delay?- How to size the inverters?May need some additional constraints.InOut16Delay Formula()()()γ/1/1~0intftCCCkRtCCRDelaypLWpLW+=+=+intintCint= γCginwithγ≈1f = CL/Cgin- effective fanoutR = Runit/W ; Cint=WCunittp0= 0.7RunitCunit917Apply to Inverter ChainCLIn Out12 Ntp= tp1+ tp2+ …+ tpN⎟⎟⎠⎞⎜⎜⎝⎛++jginjginunitunitpjCCCRt,1,1~γLNginNijginjginpNjjppCCCCttt =⎟⎟⎠⎞⎜⎜⎝⎛+==+=+=∑∑1,1,1,01, ,1γ18Apply to Inverter ChainCLIn Out12 Ntp= tp1+ tp2+ …+ tpN⎟⎟⎠⎞⎜⎜⎝⎛++jginjginunitunitpjCCCRt,1,1~LNginNijginjginpNjjppCCCCttt =⎟⎟⎠⎞⎜⎜⎝⎛+==+=+=∑∑1,1,1,01, ,11=γ1019Optimal Tapering for Given NDelay equation has N - 1 unknowns, Cgin,2– Cgin,NMinimize the delay, find N - 1 partial derivativesResult: Cgin,j+1/Cgin,j= Cgin,j/Cgin,j-1Size of each stage is the geometric mean of two neighbors- each stage has the same effective fanout (Cout/Cin)- each stage has the same delay1,1,, +−=jginjginjginCCC20Optimum Delay and Number of Stages1,/ginLNCCFf ==When each stage is sized by f and has same effective fanout f:NFf =()γ/10NppFNtt +=Minimum path delayEffective fanout of each stage:1121ExampleCL= 8 C1InOutC11 ff2283==fCL/C1has to be evenly distributed across N = 3 stages:22Optimum Number of StagesFor a given load, CLand given input capacitance CinFind optimal number of stages, N, and optimal sizing, f()⎟⎠⎞⎜⎝⎛+=+=fffFtFNttpNpplnlnln1/0/10γγγ0ln1lnln20=−−⋅=∂∂fffFtftppγγFor γ= 0, f = e, N = lnFfFNCfCFCinNinLlnln with ==⋅=fγf+=1e1223Optimum Effective Fanout fOptimum f for given process defined by γ()ffγ+=1efopt= 3.6for γ=10 0.5 1 1.5 2 2.5 32.533.544.55γfopt24Impact of Loading on tpWith self-loading γ=11 1.5 2 2.5 3 3.5 4 4.5 501234567fnormalized delay1325Extending the ModelFor given N: Ci+1/Ci= Ci/Ci-1To find N: Ci+1/Ci~ 4Method of logical effort generalizes this to any logic pathCLIn Out12 N()∑=⋅+=NiiiifgpDelay1(in units of τinv)26Logical Effort()fgpCCCRkDelayinLunitunit⋅+=⎟⎟⎠⎞⎜⎜⎝⎛+⋅=τγ1p – intrinsic delay - gate parameter ≠ f(W)g – logical effort – gate parameter ≠ f(W)f – electrical effort (fanout)Normalize everything to an inverter:ginv=1, pinv= 1Divide everything by τinv(everything is measured in unit delays τinv)Assume γ = 1.1427Delay in a Logic GateGate delay:d = h + peffort delayintrinsic delayEffort delay:h = g flogical efforteffective fanout = Cout/CinLogical effort is a function of topology, independent of sizingEffective fanout (electrical effort) is a function of load/gate size28Logical EffortInverter has the smallest logical effort and intrinsic delay of all static CMOS gatesLogical effort of a gate presents the ratio of its input capacitance to the inverter capacitance when sized to deliver the same currentLogical effort increases with the gate complexity1529Logical EffortLogical effort is the ratio of input capacitance of a gate to the inputcapacitance of an inverter with the same output currentg = 1g = g = Size factor:1.8Size factor:1.530Logical Effort of GatesFan-out (f)Normalized delay (d)t1 2 3 4 5 6 7 pINVtpNANDF(Fan-in)g=p=d=g=p=d=1631Logical Effort of GatesFan-out (f)Normalized delay (d)t1 2 3 4 5 6 7 pINVtpNANDF(Fan-in)g=1p=1d=f+1g=3.5/3p=5.5/3d=(3.5/3)f+1.832Add Branching EffortBranching effort: pathonpathoffpathonCCCb−−−+=1733Multistage NetworksStage effort: hi= gifiPath electrical effort: F = Cout/CinPath logical effort: G = g1g2…gNBranching effort: B = b1b2…bNPath effort: H = GFBPath delay D = Σdi= Σpi+ Σhi()∑=⋅+=NiiiifgpDelay134Optimum Effort per StageHhN=When each stage bears the same effort:NHh =()PNHpfgDNiii+=+=∑/1ˆMinimum path delayEffective fanout of each stage:iighf =Stage efforts: g1f1= g2f2= … = gNfN1835Optimal Number of StagesFor a given load, and given input capacitance of the first gateFind optimal number of stages and optimal sizingPNHDN+=/1()0ln/1/1/1=++−=∂∂PHHHNDNNNNHhˆ/1=Substitute ‘best stage effort’36Logical Effort Optimization MethodologyFor smaller problems, easy to translate into set of analytical expressionsFeed them into Matlab optimizerWith some careful manipulations, can be turned into a convex optimization problem (Stojanovic)Easily extended to add power/energy1937Optimization for PerformanceOptions• Technology choiceCMOS, bipolar, BiCMOS, GaAs, Superconducting• Logic level


View Full Document
Download Optimization for Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optimization for Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optimization for Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?