DOC PREVIEW
MIT 6 893 - Issue Logic and Power/Performance Tradeoffs

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Edwin OlsonAndrew MenardDecember 5, 2000l Low performance - PIMsl High performance – video decoding/MP3 playbackl And increasingly, both.– How do you design an architecture that can do both?l High performance processor that can be lobotomized– Modify Issue Logic– Change structure sizesl Two separate cores– A high performance/high-power core– A low performance/low-power corel Voltage scaling– Huge power savings– There’s a limit & high performance designs are pushing towards low voltage– which doesn’t leave much room for throttling.l Burn & Coast– Compute at full speed, and then go into a sleep mode. – Simple linear power/performance throttling.l SimpleScalar/Wattch– Widely used but little/no verification. Several power models available, but very large margins of error. – Still, the size of structures is correlated to power consumption.l Industry survey– Look at real-world processors with the range of characteristics of interest.l SpecInt95– Substantially reduced input sets to make simulation feasible.l Popular idea- it’s a highly active chip structure. Window responsible for 20% of non-clock power (Alpha 21264 & Wattch agree)l Does it work?– Let’s look at RUU usagel What’s an upper bound on the useful size?l How do smaller sizes impact performance and power?l Modified SimpleScalar, let RUU be arbitrarily big.4-issue00.20.40.60.811.20 16324864RUU OccupancyFraction of Cyclesli perl compress mk88sim8-issue00.20.40.60.811.20 16324864RUU OccupancyFraction of Cyclesli perl compress mk88siml The RUU’s occupancy “saturates” as one would expect. RUU Usage - li00.20.40.60.811.20 4 8 121620242832RUU OccupancyCycles16 Entry RUUUnlimited RUUmk88sim on 4-issue00.20.40.60.811.2024681111122222333334444455555666RUU SizeFraction of cycles4 8 16 32 64mk88sim on 8-issue00.20.40.60.811.2024681111122222333334444455555666RUU SizeFraction of cycles8 16 32 64l Performance rapidly approaches maximum.l 8-issue needs a slightly larger RUU, as expected.IPC vs RUU size for 4-issue00.20.40.60.811. 21. 41. 61. 820 8 16 24 32 40 48 56 64RUU Capa c it yliperlcompressm88ksimIPC vs RUU s iz e f o r 8- is s u e00.20.40.60.811. 21. 41. 61. 822.22.40 8 16 24 32 40 48 56 64RUU Capacityliperlcompressm88ksiml Power consumption increased in RUU as size increasesPower Consumption Breakdowns for 4 issue on li0510152025304x4 li 4x8 li 4x16 li 4x32 li 4x64 liConfigurationPower (W)clockresultbusaludcache2dcacheicacheregfilels qwindo wbpredrenamel There’s a minimum! And it’s pretty much where maximum performance is. Hmmm.Structure 8x8 8x16 8x32 8x64Energy/Inst (li)13.8 12.5 13.4 14.9Energy/Inst (perl)15.1 14.7 15.8 17.6Energy/inst(compress)12.4 11.4 11.9 13.3Energy/inst(m88ksim)13.0 12.1 12.9 14.4l Some groups have advocated a variable 16-32 capacity RUU. Even if scaling is perfect, there’s little to be gained.l A power-conscious architect is likely to be cornered into just one reasonable RUU size.l If we can’t lobotomize, perhaps we can add a completely separate CPU.l Sounds like a good idea– Intuition: a simple in-order processor should have lower energy/instruction than a complex out-of-order one.– Small area overhead, around 1mm^2.l Opportunity for more energy savings– Smaller register file– No issue window– Separate low-power caches (though this increases area)l SimpleScalar/Wattch is all but useless– Availability of only one parameterizable power model (Wattch) and we don’t know what trade-offs the designer made.– Wattch doesn’t support sim-inorder– E.g., Cacti cache model uses 10x greater energy than Krste.l Industry Surveyl PPC440 is 2-issue, out of orderl PPC405 is single issue, in-orderl Both use same technologyl The 440 is twice as fast, but uses only 1.66 times the power!l 5x86 is in-orderl K6 is out-of-order, 6 issue, 24 entry windowl K6 has slightly better power/performance– But it’s on a newer process (0.25um rather than 0.35)l CPUs available today, even the “low power” ones, are still after speed.– Low power IA32 is just a slower, high-power IA32.l If you designed your simple core for super-low power (without very little regard for speed), how might this change?l Smaller issue windows are not a win on power; they lower the amount of ILP found by too much.l Multiple cores are not a win on power; the faster core tends to be more energy


View Full Document

MIT 6 893 - Issue Logic and Power/Performance Tradeoffs

Documents in this Course
Toolkits

Toolkits

16 pages

Cricket

Cricket

29 pages

Quiz 1

Quiz 1

8 pages

Security

Security

28 pages

Load more
Download Issue Logic and Power/Performance Tradeoffs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Issue Logic and Power/Performance Tradeoffs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Issue Logic and Power/Performance Tradeoffs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?