DOC PREVIEW
CMU CS 15740 - leecture

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1Superscalar ProcessingCS 740September 25-27, 2000Intel Processors• 486, Pentium, Pentium ProSuperscalar Processor Design• Use PowerPC 604 as case study• Speculative Execution, Register Renaming, Branch PredictionMore Superscalar Examples• MIPS R10000• DEC Alpha 21264CS 740 F’00– 2 –Intel x86 ProcessorsProcessor YearTransistorsMHz Spec92 (Int/FP)Spec95 (Int/FP)8086 ‘78 29K 4Basis of IBM PC & PC-XTi286 ‘83 134K 8Basis of IBM PC-ATi386 ‘86 275K 16‘88 33 6 / 3i486 ‘89 1.2M 2050 28 / 13Pentium ‘93 3.1M 66 78 / 64150 181 / 125 4.3 / 3.0PentiumPro ‘95 5.5M 150 245 / 220 6.1 / 4.8200 320 / 283 8.2 / 6.0Pentium II ‘97 7.5M 300 11.6 / 6.8Merced ‘00? 14M ? ? ?CS 740 F’00– 3 –Other ProcessorsProcessor Year Transistors MHz Spec92 Spec95MIPS R3000 ‘88 25 16.1 / 21.7(DecStation 5000/120)MIPS R5000 3.6M 180 4.1 / 4.4(Wean Hall SGIs)MIPS R10000 ‘95 5.9M 200 300 / 600 8.9 / 17.2(Most Advanced MIPS)Alpha 21164a ‘96 9.3M 417 500 / 750 11 / 17500 12.6 / 18.3(Fastest Available)Alpha 21264 ‘97 15M 500 30 / 60(Fastest Announced)CS 740 F’00– 4 –Architectural PerformanceMetric• SpecX92/Mhz: Normalizes with respect to clock speed• But … one measure of good arch. is how fast can run clockSamplingProcessor MHz SpecInt92 IntAP SpecFP92 FltAPi386/387 33 6 0.2 3 0.1i486DX 50 28 0.6 13 0.3Pentium 150 181 1.2 125 0.8PentiumPro 200 320 1.6 283 1.4MIPS R3000A 25 16.1 0.6 21.7 0.9MIPS R10000 200 300 1.5 600 3.0Alpha 21164a 417 500 1.2 750 1.8Page 2CS 740 F’00– 5 –x86 ISA CharacteristicsMultiple Data Sizes and Addressing Methods• Recent generations optimized for 32-bit modeLimited Number of Registers• Stack-oriented procedure call and FP instructions• Programs reference memory heavily (41%)Variable Length Instructions• First few bytes describe operation and operands• Remaining ones give immediate data & address displacements• Average is 2.5 bytesCS 740 F’00– 6 –i486 PipelineFetch• Load 16-bytes of instruction into prefetch bufferDecode1• Determine instruction length, instruction typeDecode2• Compute memory address• Generate immediate operandsExecute• Register Read• ALU operation• Memory read/writeWrite-Back• Update register fileCS 740 F’00– 7 –Pipeline Stage DetailsFetch• Moves 16 bytes of instruction stream into code queue• Not required every time– About 5 instructions fetched at once– Only useful if don’t branch• Avoids need for separate instruction cacheD1• Determine total instruction length– Signals code queue aligner where next instruction begins• May require two cycles– When multiple operands must be decoded– About 6% of “typical” DOS programCS 740 F’00– 8 –Stage Details (Cont.)D2• Extract memory displacements and immediate operands• Compute memory addresses– Add base register, and possibly scaled index register• May require two cycles– If index register involved, or both address & immediate operand– Approx. 5% of executed instructionsEX• Read register operands• Compute ALU function• Read or write memory (data cache)WB• Update register resultPage 3CS 740 F’00– 9 –Data HazardsData HazardsGenerated Used HandlingALU ALU EX–EX ForwardingLoad ALU EX–EX ForwardingALU Store EX–EX ForwardingALU Eff. Address (Stall) + EX–ID2 ForwardingCS 740 F’00– 10 –Control HazardsJump Instruction Processsing• Continue pipeline assuming branch not taken• Resolve branch condition in EX stage• Also speculatively fetch at target during EX stageID1 ID2 EXJump Instr.ID1 ID2Jump +1ID1Jump +2FetchTargetCS 740 F’00– 11 –Control Hazards (Cont.)Branch taken• Flush instructions in pipe• Begin ID1 at target.• Total of 3 cycles for instructionBranch Not Taken• Allow pipeline to continue.• Total of 1 cycle for instructionID1 ID2 EXJump Instr.ID1 ID2Jump +1ID1Jump +2FetchTargetEXID2(Flushed)Jump +3ID1ID1 ID2 EXJump Instr.ID1 ID2Jump +1ID1Jump +2FetchTarget(Flushed)ID1(Flushed)CS 740 F’00– 12 –Comparison with Our pAlpha PipelineTwo Decoding Stages• Harder to decode CISC instructions• Effective address calculation in D2Multicycle Decoding Stages• For more difficult decodings• Stalls incoming instructionsCombined Mem/EX Stage• Avoids load stall without load delay slot– But introduces stall for address computationPage 4CS 740 F’00– 13 –Comparison to 386Cycles Per InstructionInstruction Type 386 Cycles 486 CyclesLoad 4 1Store 2 1ALU 2 1Jump taken 9 3Jump not taken 3 1Call 9 3Reasons for Improvement• On chip cache– Faster loads & stores• More pipeliningCS 740 F’00– 14 –Pentium Block Diagram(Microcprocessor Report 10/28/92)MemoryDataBusCS 740 F’00– 15 –Pentium PipelineFetch & Align InstructionDecode Instr.Generate Control WordDecode Control WordGenerate Memory AddressAccess data cache orcalculate ALU resultWrite register resultDecode Control WordGenerate Memory AddressAccess data cache orcalculate ALU resultWrite register resultU-Pipe V-PipeCS 740 F’00– 16 –Superscalar ExecutionCan Execute Instructions I1 & I2 in Parallel if:• Both are “simple” instructions– Don’t require microcode sequencing– Some operations require U-pipe resources– 90% of SpecInt instructions• I1 is not a jump• Destination of I1 not source of I2– But can handle I1 setting CC and I2 being cond. jump• Destination of I1 not destination of I2If Conditions Don’t Hold• Issue I1 to U Pipe• I2 issued on next cycle– Possibly paired with following instructionPage 5CS 740 F’00– 17 –Branch PredictionBranch Target Buffer• Stores information about previously executed branches– Indexed by instruction address– Specifies branch destination + whether or not taken• 256 entriesBranch Processing• Look for instruction in BTB• If found, start fetching at destination• Branch condition resolved early in WB– If prediction correct, no branch penalty– If prediction incorrect, lose ~3 cycles»Which corresponds to > 3 instructions• Update BTBCS 740 F’00– 18 –Superscalar TerminologyBasicSuperscalar Able to issue > 1 instruction / cycleSuperpipelined Deep, but not superscalar pipeline.E.g., MIPS R5000 has 8 stagesBranch prediction Logic to guess whether or not branch will be taken, and possibly branch targetAdvancedOut-of-order Able to issue instructions out of program orderSpeculation Execute instructions beyond branch points, possibly nullifying laterRegister renaming Able to dynamically assign physical


View Full Document

CMU CS 15740 - leecture

Documents in this Course
Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download leecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view leecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view leecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?