DOC PREVIEW
CMU CS 15740 - Lecture

This preview shows page 1-2-3-4-30-31-32-33-34-62-63-64-65 out of 65 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Superscalar Processing CS 740 September 25-27, 2000Intel x86 ProcessorsOther ProcessorsArchitectural Performancex86 ISA Characteristicsi486 PipelinePipeline Stage DetailsStage Details (Cont.)Data HazardsControl HazardsControl Hazards (Cont.)Comparison with Our pAlpha PipelineComparison to 386Pentium Block DiagramPentium PipelineSuperscalar ExecutionBranch PredictionSuperscalar TerminologySuperscalar Execution ExampleAdding Advanced FeaturesPentium Pro (P6)PentiumPro Block DiagramPentiumPro OperationSlide 24Limitations of x86 Instruction SetPPC 604604 Block DiagramGeneral PrinciplesProcessing StagesFetching InstructionsSlide 31DispatchDispatching ActionsHazard Handling with RenamingRead-after-Write (RAW) DependencesWrite-after-Read (WAR) DependencesWrite-after-Write (WAW) DependencesMoving Instructions AroundExecution ResourcesRetiring Instructions604 ChipExecution ExampleExecution Example Cycle 1Execution Example Cycle 2Cycle 3Execution Example Cycle 4Execution Example Cycle 5Execution Example Cycle 6Execution Example Cycle 7Living with Expensive BranchesBranch Prediction ExampleSome Interesting PatternsLoop Performance (FP)Loop 1 SurprisesP6 Branch PredictionBranch Prediction ComparisonsEffect of Loop UnrollingMIPS R10000DEC Alpha 2126421264 Block Diagram21264 Pipeline21264 Branch Prediction LogicProcessor ComparisonsChallenges AheadNew Era for Performance OptimizationSuperscalar ProcessingCS 740September 25-27, 2000Intel Processors•486, Pentium, Pentium ProSuperscalar Processor Design•Use PowerPC 604 as case study•Speculative Execution, Register Renaming, Branch PredictionMore Superscalar Examples•MIPS R10000•DEC Alpha 21264CS 740 F’00– 2 –Intel x86 ProcessorsProcessor YearTransistorsMHzSpec92 (Int/FP)Spec95 (Int/FP)8086 ‘78 29K4Basis of IBM PC & PC-XTi286 ‘83 134K8Basis of IBM PC-ATi386 ‘86 275K16‘88 33 6 / 3i486 ‘89 1.2M2050 28 / 13Pentium ‘93 3.1M6678 / 64150 181 / 1254.3 / 3.0PentiumPro ‘955.5M 150245 / 2206.1 / 4.8200 320 / 2838.2 / 6.0Pentium II ‘97 7.5M300 11.6 / 6.8Merced ‘00? 14M? ??CS 740 F’00– 3 –Other ProcessorsProcessor YearTransistors MHz Spec92 Spec95MIPS R3000 ‘88 25 16.1 / 21.7(DecStation 5000/120)MIPS R5000 3.6M 180 4.1 / 4.4(Wean Hall SGIs)MIPS R10000‘95 5.9M 200 300 / 600 8.9 / 17.2(Most Advanced MIPS)Alpha 21164a ‘96 9.3M 417 500 / 750 11 / 17500 12.6 / 18.3(Fastest Available)Alpha 21264 ‘97 15M 500 30 / 60(Fastest Announced)CS 740 F’00– 4 –Architectural PerformanceMetric•SpecX92/Mhz: Normalizes with respect to clock speed•But … one measure of good arch. is how fast can run clockSamplingProcessor MHz SpecInt92 IntAP SpecFP92 FltAPi386/387 33 6 0.2 3 0.1i486DX 50 28 0.6 13 0.3Pentium 150 181 1.2 125 0.8PentiumPro 200 320 1.6 283 1.4MIPS R3000A 25 16.1 0.6 21.7 0.9MIPS R10000200 300 1.5 600 3.0Alpha 21164a417 500 1.2 750 1.8CS 740 F’00– 5 –x86 ISA CharacteristicsMultiple Data Sizes and Addressing Methods•Recent generations optimized for 32-bit modeLimited Number of Registers•Stack-oriented procedure call and FP instructions•Programs reference memory heavily (41%)Variable Length Instructions•First few bytes describe operation and operands•Remaining ones give immediate data & address displacements•Average is 2.5 bytesCS 740 F’00– 6 –i486 PipelineFetch•Load 16-bytes of instruction into prefetch bufferDecode1•Determine instruction length, instruction typeDecode2•Compute memory address•Generate immediate operandsExecute•Register Read•ALU operation•Memory read/writeWrite-Back•Update register fileCS 740 F’00– 7 –Pipeline Stage DetailsFetch•Moves 16 bytes of instruction stream into code queue•Not required every time–About 5 instructions fetched at once–Only useful if don’t branch•Avoids need for separate instruction cacheD1•Determine total instruction length–Signals code queue aligner where next instruction begins•May require two cycles–When multiple operands must be decoded–About 6% of “typical” DOS programCS 740 F’00– 8 –Stage Details (Cont.)D2•Extract memory displacements and immediate operands•Compute memory addresses–Add base register, and possibly scaled index register•May require two cycles–If index register involved, or both address & immediate operand–Approx. 5% of executed instructionsEX•Read register operands•Compute ALU function•Read or write memory (data cache)WB•Update register resultCS 740 F’00– 9 –Data HazardsData HazardsGenerated Used HandlingALU ALU EX–EX ForwardingLoad ALU EX–EX ForwardingALU Store EX–EX ForwardingALU Eff. Address (Stall) + EX–ID2 ForwardingCS 740 F’00– 10 –Control HazardsJump Instruction Processsing•Continue pipeline assuming branch not taken•Resolve branch condition in EX stage•Also speculatively fetch at target during EX stageID1 ID2 EXJump Instr.ID1 ID2Jump +1ID1Jump +2FetchTargetCS 740 F’00– 11 –Control Hazards (Cont.)Branch taken•Flush instructions in pipe•Begin ID1 at target.•Total of 3 cycles for instructionBranch Not Taken•Allow pipeline to continue.•Total of 1 cycle for instructionID1 ID2 EXJump Instr.ID1 ID2Jump +1ID1Jump +2FetchTargetEXID2(Flushed)Jump +3ID1ID1 ID2 EXJump Instr.ID1 ID2Jump +1ID1Jump +2FetchTarget(Flushed)ID1(Flushed)CS 740 F’00– 12 –Comparison with Our pAlpha PipelineTwo Decoding Stages•Harder to decode CISC instructions•Effective address calculation in D2Multicycle Decoding Stages•For more difficult decodings•Stalls incoming instructionsCombined Mem/EX Stage•Avoids load stall without load delay slot–But introduces stall for address computationCS 740 F’00– 13 –Comparison to 386Cycles Per InstructionInstruction Type 386 Cycles 486 CyclesLoad 4 1Store 2 1ALU 2 1Jump taken 9 3Jump not taken 3 1Call 9 3Reasons for Improvement•On chip cache–Faster loads & stores•More pipeliningCS 740 F’00– 14 –Pentium Block Diagram(Microcprocessor Report 10/28/92)MemoryDataBusCS 740 F’00– 15 –Pentium PipelineFetch & Align InstructionDecode Instr.Generate Control WordDecode Control WordGenerate Memory AddressAccess data cache orcalculate ALU resultWrite register resultDecode Control WordGenerate Memory AddressAccess data cache orcalculate ALU resultWrite register resultU-Pipe V-PipeCS 740 F’00– 16 –Superscalar ExecutionCan Execute Instructions I1 & I2 in Parallel if:•Both are “simple” instructions–Don’t require microcode sequencing–Some operations require U-pipe


View Full Document

CMU CS 15740 - Lecture

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?