DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 21 – Advanced Processors II

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors II2004-11-16 Dave Patterson(www.cs.berkeley.edu/~patterson)John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 21 – Advanced Processors IIwww-inst.eecs.berkeley.edu/~cs152/Thanks to Krste Asanovic ...Thanks to Krste Asanovic ...UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IILast Time: Superpipelining & SuperscalarQ. Could adding pipeline stages reduce CPI for an application?ARM XScale8 stagesCPI Problem Possible SolutionExtra branch delaysBranch predictionExtra load delays Optimize codeStructural hazardsOptimize code, add hardwareA. Yes, due to these problems:UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIToday: Dynamic Scheduling OverviewGoal: Enable out-of-order by breaking pipeline in two: fetch and execution.Example: IBM Power 5: I-fetch and decode: like static pipelinesI-fetch and decode: like static pipelinesToday’s focus:execution unitToday’s focus:execution unitUC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIDynamic Scheduling: A mix of 3 ideasTop-down idea: Registers that may be written only once (but may be read many times) eliminate WAW and WAR hazards.Mid-level idea: An instruction waiting for an operand to execute may trigger on the (single) write to the associated register.Bottom-up idea: To support “snooping” on register writes, attach all machine elements to a common bus.Robert Tomasulo, IBM, 1967. FP unit for IBM 360/91UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIUC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIA common bus == long wires == slow?Pipelinesin theoryWires are short, so clock periods can be short.“wiring by abutment”“wiring by abutment”Pipelinesin practiceLong wires are the price we paid to avoidstallsLong wires are the price we paid to avoidstallsConjecture:If processor speed is limited by long wires,lets do a designthat fully uses the semantics of long wiresby using a bus.Conjecture:If processor speed is limited by long wires,lets do a designthat fully uses the semantics of long wiresby using a bus.UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIA bus-based multi-cycle computerStoreUnitTo MemoryLoadUnitFrom MemoryRegisterFileALU #1ALU #2(1) Only one unit writes at a time (one source).(2) All units may read the written values (many destinations).(1) Only one unit writes at a time (one source).(2) All units may read the written values (many destinations).Common Data BusCommon Data Bus... If we add too many functional units, one bus is too long, too slow. Solutions: more buses, faster electrical signalling If we add too many functional units, one bus is too long, too slow. Solutions: more buses, faster electrical signallingUC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIAdministrivia: Final project beginsThursday 11/18: Preliminary design document due, by 9 PM.Friday 11/19: Review design document with TAs in lab section.Sunday 11/21: Revised design document due in email, by 11:59 PMFriday 12/3: Demo deep pipelining to TAs in lab section.UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIAdministrivia: Mid-term and Field Trip Xilinx field trip: Tuesday 11/30, bus leaves at 8:30 AM, from 4th floor Soda.Mid-Term II Review Session: Sunday, 11/21, 7-9 PM, 306 Soda.Thursday 12/2: Advice on Presentations.Prepare you for your final project talk.Send Doug RSVP (options: on bus, driving, not going)Send Doug RSVP (options: on bus, driving, not going)Thanksgiving Holidays!Thanksgiving Holidays!Mid-Term II: Tuesday, 11/23, 5:30 to 8:30 PM, 101 Morgan. LaVal’s @ 9 PM!UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIRegister RenamingUC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIConsider this simple loop ...Every pass through the loop introduces the potential for WAW and/or WAR hazardsfor F0, F4, and R1.Every pass through the loop introduces the potential for WAW and/or WAR hazardsfor F0, F4, and R1.F4,0(R1)UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIADDI PR01,PR00,64LD PF00 0(PR01)ADDD PF04, PF00, PF02SD PF04, 0(PR01)SUBI PR11, PR01, 8BEQZ PR11 ENDLOOPITER2: LD PF10 0(PR11)ADDD PF14, PF10, PF02SD PF14, 0(PR11)SUBI PR21, PR11, 8BEQZ PR21 ENDLOOPITER3: LD PF20 O(PR21)[...]ADDI PR01,PR00,64LD PF00 0(PR01)ADDD PF04, PF00, PF02SD PF04, 0(PR01)SUBI PR11, PR01, 8BEQZ PR11 ENDLOOPITER2: LD PF10 0(PR11)ADDD PF14, PF10, PF02SD PF14, 0(PR11)SUBI PR21, PR11, 8BEQZ PR21 ENDLOOPITER3: LD PF20 O(PR21)[...]R1→ PR01F0→ PF00Given an endless supply of registers ... Rename “architected registers” (Ri, Fi) to new “physical registers” (PRi, PFi) on each write.An instruction may execute once all of its source registershave been written.An instruction may execute once all of its source registershave been written.ADDI R1,R0,64F4,0(R1)UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIData-Driven Execution(Associative Control)Caveat: In comparison to static pipelines, there is great diversity in dynamic scheduling implementations. Presentation that follows is a composite, and does not reflect any specific machine.Caveat: In comparison to static pipelines, there is great diversity in dynamic scheduling implementations. Presentation that follows is a composite, and does not reflect any specific machine.UC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIRecall: IBM Power 5 block diagram ...Interface between instruction fetch and execution.Interface between instruction fetch and execution.MP = “Mapping” from architected registers tophysical registers (renaming).MP = “Mapping” from architected registers tophysical registers (renaming).ISS = Instruction IssueISS = Instruction IssueUC Regents Fall 2004 © UCBCS 152 L21: Advanced Processors IIInstructions placed in “Reorder Buffer”Inst #[...]src1 #src1 valsrc2 #src2 valdest #dest val67[...]ReorderBufferStoreUnitTo MemoryLoadUnitFrom MemoryALU #1 ALU #2Each lineholds physical<src1, src2, dest>registersfor an instruction,and controlswhen it executesExecution engine works on the physicalregisters, not the architecture registers.Execution engine works on the physicalregisters, not the architecture registers.Common Data


View Full Document

Berkeley COMPSCI 152 - Lecture 21 – Advanced Processors II

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 21 – Advanced Processors II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 21 – Advanced Processors II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 21 – Advanced Processors II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?