DOC PREVIEW
UCLA EE 116B - Architecture Issues in VLSI Systems

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

3/20/20011Mani SrivastavaUCLA - EE [email protected] Issues in VLSI Systems EE116B (Winter 2001): Lecture # 8Copyright 2001  Mani SrivastavaArchitecture Techniques in VLSIn Big impact of architectural level techniques and optimizationsu Performanceu PowerCopyright 2001  Mani SrivastavaPipelining and Retiming+D DD+DDGDDG’D DDDPipeliningRetimingD (a) * D (b) = D (a * b)D = Register3/20/20012Copyright 2001  Mani SrivastavaRetimingCopyright 2001  Mani SrivastavaHow does Retiming help?++++DD87561234++++DD87561234DDCYCLESMultipliersAdders1 1,3 -2 2,4 53 - 6,84 - 7CYCLESMultipliersAdders1 2 82 3 63 1 74 4 5before afterCopyright 2001  Mani SrivastavaClock Period Minimizationn Example: 100 stage lattice filtern Assume add and multiply take 1 and 2 units of time respectivelyn Critical path in dotted linen Minimum sample period = 1053/20/20013Copyright 2001  Mani SrivastavaClock Period Minimization (contd.)n 2-slowdown versionCopyright 2001  Mani SrivastavaClock Period Minimization (contd.)n Retimed version of the 2-slow down modeln Critical path = 6n Minimum sample period = 12Copyright 2001  Mani SrivastavaRetiming for Register Minimization3/20/20014Copyright 2001  Mani SrivastavaExample of Retiming for Register MinimizationCopyright 2001  Mani SrivastavaLoop Boundn y(n) = a.y(n-1) + x(n)n Critical path: longest path with zero delaysF A->B has length 6n Intra-iteration precedence vs. Inter-iteration precedenceF A →B and B⇒ACopyright 2001  Mani SrivastavaLoop Boundn Two critical paths of length 6F 6→3→2→1F 5 →3→2→1n But what is the fundamental limit on how fast can the underlying circuit be implemented?3/20/20015Copyright 2001  Mani SrivastavaIteration Bound T∞n Iteration boundn Example: three loops with loop bounds ofF 4/2, 5/3, 5/4F T∞= 2Copyright 2001  Mani SrivastavaIteration Bound T∞(contd.)n T∞= 3 n T∞= max(6/2, 11/1)= 11Copyright 2001  Mani SrivastavaAnother Trick: Unfolding3/20/20016Copyright 2001  Mani SrivastavaUnfolding (contd.)Copyright 2001  Mani SrivastavaHow to do Unfolding?Copyright 2001  Mani SrivastavaAnother Example3/20/20017Copyright 2001  Mani SrivastavaHow does Unfolding help?n Unfolding a circuit with iteration bound T∞results in a J-unfolded circuit with iteration bound JT∞n Seems like no win, but can help with actual sample period in two scenariosF Case 1: sample period could not be made equal to iteration period because of some node with computation time greater than T∞F Case 2: sample period could not be made equal to iteration period because T∞is not an integerCopyright 2001  Mani SrivastavaExample of Case 1T∞=3Minimum sample period = 4T∞=6Minimum sample period = 6/2=3Copyright 2001  Mani SrivastavaExample of Case 2T∞=4/3Minimum sample period = 2T∞=4Minimum sample period = 4/33/20/20018Copyright 2001  Mani SrivastavaPower Consumption in CMOS Digital Logicn Dynamic power consumptionF charging and discharging capacitorsn Short circuit currentsF short circuit path between supply rails during switchingn LeakageF leaking diodes and transistorsF problem even when in standby!Copyright 2001  Mani SrivastavaPower Consumption in CMOS Digital logic (contd.)P = A.C.V2.f + A.Isw.V.f + Ileak.VwhereA = activity factor (probability of 0 → 1 transition)C = total chip capacitanceV = total voltage swing, usually near the power supply voltagef = clock frequencyIsw= short circuit current when logic level changesIleak= leakage current in diodes and transistorsCopyright 2001  Mani SrivastavaWhy not simply lower V?n Total P can be minimized by lower VF lower V are a natural result of smaller feature sizesn But… transistor speeds decrease dramatically as V is reduced to close to “threshold voltage”F performance goals may not be metF td= CV / k(V-Vt)αwhere α is between 1-2 n Why not lower this “threshold voltage”?F makes noise margin and Ileakworse!n Need to do smarter voltage scaling!3/20/20019Copyright 2001  Mani SrivastavaSpeed vs. Voltage1.0 1.5 2.0 2.5 3.0Supply Voltage, V1.03.05.07.0Normalized Delay3/20/200110Copyright 2001  Mani SrivastavaExample: Reference Datapathfrom “Digital Integrated Circuits” by Rabaeyn Critical path delay: Tadder+ Tcomparator= 25 nsn Frequency: fref= 40 MHzn Total switched capacitance = Crefn Vdd= Vref= 5Vn Power for reference datapath = Pref= CrefVref2frefCopyright 2001  Mani SrivastavaParallel Datapathfrom “Digital Integrated Circuits” by Rabaeyn The clock rate can be reduced by x2 with the samethroughput: fpar= fref/2 = 20 MHzn Total switched capacitance = Cpar= 2.15Crefn Voar= Vref/1.7n Ppar= (2.15Cref)(Vref/1.7)2(fref/2) = 0.36PrefCopyright 2001  Mani SrivastavaPipelined Datapathfrom “Digital Integrated Circuits” by Rabaeyn fpipe= frefCpipe= 1.1CrefVpipe= Vref/1.7n Voltage can be dropped while maintaining the originalthroughputn Pipe = CpipeVpipe2fpipe= (1.1Cref)(Vref/1.7)2fref= 0.37Pref3/20/200111Copyright 2001  Mani SrivastavaDatapath Architecture-Power Trade-off SummaryDatapathArchitectureVoltageArea PowerOriginal 5V 1 1Pipelined 2.9V 1.3 0.37Parallel 2.9V 3.4 0.34Pipeline-Parallel2.0V 3.7 0.18Copyright 2001  Mani SrivastavaExample of Voltage Scaling1 2 3 4 5 6 7 8Number of Processors, N0.01.02.03.0% CommunicationOverhead1 2 3 4 5 6 7 8Number Of Processors, N1.03.05.07.0Ideal SpeedupActual SpeedupSupply Voltage(Fixed Throughput)1 2 3 4 5 6 7 8Number of Processors, N0.20.40.60.81.0NormalizedPower3/20/200112Copyright 2001  Mani SrivastavaShutdown for Energy Savingn Subsystems may have small duty factorsF CPU, disk, wireless interface are often idlen Huge difference between “on” & “off” powerF Some Low-Power CPUs:Ruby II 150mW (on-line)/ 40 mW / 7.5 mW (sleep) /750 mW (stop)Hobbit 250 mW (active) / 50 mW (doze)F 2.5” Hard Disk [Harris95]:1.35W (idle spinning) / 0.4W (standby) / 0.2W (sleep) / 4.7W (start-up)Blocked“Off”Active“On”TblockTactiveideal improvement = 1 + Tblock/TactiveCopyright 2001  Mani SrivastavaPotential CPU Power Reduction in a Wireless X Terminaln 96-98% time spent in the blocked staten Average time in the blocked state is short (<< a second)210 220 230 240 250 260Time (in seconds)OFFONState of X Server ProcessTrace 1 Trace 2 Trace 3Trace Length (sec) 5182.48 26859.9 995.16Toff (sec) 5047.47 26427.4 960.82Ton (sec) 135.01 432.5 34.34Toff/(Toff+Ton) 0.9739 0.9839 0.9655Max Energy Reductionx38.4 x62.1 x29.0Copyright 2001  Mani


View Full Document

UCLA EE 116B - Architecture Issues in VLSI Systems

Download Architecture Issues in VLSI Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Architecture Issues in VLSI Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Architecture Issues in VLSI Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?