DOC PREVIEW
Berkeley COMPSCI 152 - Low Power Design, Advanced Intel Processors

This preview shows page 1-2-3-4-5-35-36-37-38-39-70-71-72-73-74 out of 74 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 74 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 25 Low Power Design, Advanced Intel ProcessorsRecap: I/O SummarySlides Borrowed from Bob BrodersonPowerPoint PresentationSlide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Back to original goal: Processor Usage ModelTypical UsageAnother approach: Reduce FrequencyAlternative: Dynamic Voltage ScalingWhat about bus transitions?ReasoningHuffman-based CompressionContext-based encoderJust the Shift-register: “window-based”Administrivia7 Talk Commandments for a Bad TalkFollowing all the commandmentsAlternatives to a Bad TalkInclude in your final presentationReview: Road to Faster ProcessorsSlide 41Dynamic Scheduling in Pentium Pro, II, IIIDynamic Scheduling in P6 (Pentium Pro, II, III)P6 PipelineP6 Block DiagramDynamic Scheduling in P6Pentium III Die PhotoP6 Performance: uops/x86 instr 200 MHz, 8KI$/8KD$/256KL2$, 66 MHz busP6 Performance: Speculation rate (% instructions issued that do not commit)P6 Performance: mops commit/clockP6 Dynamic Benefit? Sum of parts CPI vs. Actual CPIPentium 4 featuresPentium 4 features (Continued)RegistersSIMD: Single Instruction Multiple DataPentium 4 CachePentium 4 basic block diagramPentium 4 Trace Cache 1/4Trace Cache ExampleSlide 60Slide 61Slide 62Full Block diagram (Intel)Out-of-Order Execution -- PipelineComparison of two architecturesRegister Renaming: Pentium III vs NetBurstStaggered ALU AddPentium 4 Speeds & FeedsPentium 4 Basic FeaturesSlide 70Performance ComparisonSPEC 2000 Performance 3/2001 Source: Microprocessor Report,Conclusion: PowerConclusion: IntelCS152Computer Architecture and Engineering Lecture 25Low Power Design,Advanced Intel ProcessorsMay 3, 2004John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.2Recap: I/O Summary°I/O performance limited by weakest link in chain between OS and device°Queueing theory is important•100% utilization means very large latency•Remember, for M/M/1 queue (exponential source of requests/service)-queue size goes as u/(1-u)-latency goes as Tser×u/(1-u)•For M/G/1 queue (more general server, exponential sources)-latency goes as m1(z) x u/(1-u) = Tser x {1/2 x (1+C)} x u/(1-u)°Three Components of Disk Access Time:•Seek Time: advertised to be 8 to 12 ms. May be lower in real life.•Rotational Latency: 4.1 ms at 7200 RPM and 8.3 ms at 3600 RPM•Transfer Time: 2 to 50 MB per second°I/O device notifying the operating system:•Polling: it can waste a lot of processor time•I/O interrupt: similar to exception except it is asynchronous°Delegating I/O responsibility from the CPU: DMA, or even IOP5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.3Slides Borrowed from Bob BrodersonLow Power Design5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.45/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.55/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.65/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.75/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.85/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.95/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.105/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.115/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.125/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.133/4  1/4 = 3/163/165/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.145/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.155/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.165/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.175/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.185/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.195/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.205/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.215/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.225/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.235/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.245/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.255/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.26timeDesiredThroughputSingle-user systemCeiling:Background andCompute-intensive andSystem Optimizations:• Maximize Peak Throughput• Minimize Average Energy/operationof the processor Set by top speedhigh-latency processeslow-latency processes(maximize computation per battery life)not always computingBack to original goal: Processor Usage Model5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.27Typical UsageDeliveredThroughputAlways high throughput PeakWake up  Compute ASAP  Go to idle/sleep modeAlways high energy/operationExcess throughputtime5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.28Another approach: Reduce FrequencyfCLKReducedDeliveredThroughputPeaktimeEnergy/operation remains unchanged...while throughput scales down with fCLKProblems: • Circuits designed to be fast are now “wasted”.• Demand for peak throughput not met.SlowFastPowerBookControl PanelFrequency set by user5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.29Alternative: Dynamic Voltage ScalingDynamically scale energy/operation with throughputExtend battery life by up to 10xwith the same hardware!DeliveredThroughputPeakReduce throughput & fCLK,Reduce energy/operationKey: Process scheduler determines operating point.time5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.30What about bus transitions?°Can we reduce total number of transitions on buses by sophisticated bus drivers?°Can we encode information in a way that takes less power?•Do this on chip?!•Trying to reduce total number of transitionsEncoded VersionDecodeEncoderOutputInput5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.31Reasoning°Increasing importance of wires relative to transistors•Spend transistors to drive wires more efficiently?•Try to reduce transitions over wires°Orthogonal to other power-saving techniques•I.e. voltage reduction, low-swing drive•clock gating•Parallelism (like vectors!)5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.32Huffman-based Compression°Variable bit length – problem!°Possible soln: macro clock°Less bits != less transitions…DecodeEncoderOutputInput5/03/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec25.33Context-based encoder°Context-based encoder•Detecting of repeated values going


View Full Document

Berkeley COMPSCI 152 - Low Power Design, Advanced Intel Processors

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Low Power Design, Advanced Intel Processors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Low Power Design, Advanced Intel Processors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Low Power Design, Advanced Intel Processors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?