DOC PREVIEW
Berkeley COMPSCI 252 - Lecture 4 Control flow and interrupts Software Scheduling around hazards

This preview shows page 1-2-3-4-26-27-28-54-55-56-57 out of 57 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 4 Control flow and interrupts (cont’d) Software Scheduling around hazardsReview: Control Flow and ExceptionsReview: A“zero-cycle” jumpPowerPoint PresentationWhy not do this for branches? (original CRISP idea, applied to DLX)Slide 6Way of looking at timing:However, one could use the first technique to reflect PREDICTIONS and remove delay slotsBook talks about R4000 (taken from page 204)Exceptions and InterruptsExample: Device Interrupt (Say, arrival of network message)Alternative: Polling (again, for arrival of network message)Polling is faster/slower than Interrupts.Exception/Interrupt classificationsA related classification: Synchronous vs. AsynchronousInterrupt controller hardware and mask levelsRecap: Device Interrupt (Say, arrival of network message)SPARC (and RISC I) had register windowsSupervisor StateEntry into Supervisor ModeAdministrativeReview: Device Interrupt (Say, arrival of network message)Precise Interrupts/ExceptionsPrecise interrupt point requires multiple PCs to describe in presence of delayed branchesWhy are precise interrupts desirable?Approximations to precise interruptsPrecise Exceptions in simple 5-stage pipeline:Another look at the exception problemHow to achieve precise interrupts when instructions executing in arbitrary order?Impact of Hazards on PerformanceCase Study: MIPS R4000 (200 MHz)Case Study: MIPS R4000MIPS R4000 Floating PointMIPS FP Pipe StagesR4000 PerformanceAdvanced Pipelining and Instruction Level Parallelism (ILP)FP Loop: Where are the Hazards?FP Loop Showing StallsRevised FP Loop Minimizing StallsUnroll Loop Four Times (straightforward way)Unrolled Loop That Minimizes StallsAnother possibility: Software PipeliningSoftware Pipelining ExampleCompiler Perspectives on Code MovementWhere are the data dependencies?Slide 46Where are the name dependencies?Slide 49Slide 50Slide 51Where are the control dependencies?When Safe to Unroll Loop?Does a loop-carried dependence mean there is no parallelism???Can HW get CPI closer to 1?Next time: Advanced pipeliningSummary #1Summary #2: Software SchedulingCS252/KubiatowiczLec 4.19/13/00CS252Graduate Computer ArchitectureLecture 4Control flow and interrupts (cont’d) Software Scheduling around hazardsSeptember 13, 2000Prof. John KubiatowiczCS252/KubiatowiczLec 4.29/13/00Review: Control Flow and Exceptions•RISC vs CISC was about the integrated system’s view, not about removing instructions:–These names were a bit unfortunate in retrospect, since they caused some “religious” arguments–RISC  intelligent hardware-software tradeoffs driven by quantitative measurement with real benchmarks–End-to-end view point•Control flow is the biggest problem for computer architects. This is getting worse:–Modern computer languages such as C++ and Java user many smaller procedure calls (method invocations)–Networked devices need to respond quickly to many external events.CS252/KubiatowiczLec 4.39/13/00Review: A“zero-cycle” jump•What really has to be done at runtime?–Once an instruction has been detected as a jump or JAL, we might recode it in the internal cache.–Very limited form of dynamic compilation?•Use of “Pre-decoded” instruction cache–Called “branch folding” in the Bell-Labs CRISP processor.–Original CRISP cache had two addresses and could thus fold a complete branch into the previous instruction–Notice that JAL introduces a structural hazard on writeand r3,r1,r5addi r2,r3,#4sub r4,r2,r1jal doitsubi r1,r1,#1A:sub r4,r2,r1 doitaddi r2,r3,#4 A+8Nsub r4,r2,r1L--- -----and r3,r1,r5 A+4Nsubi r1,r1,#1 A+20NInternal Cache state:CS252/KubiatowiczLec 4.49/13/00MemoryAccessWriteBackInstructionFetchInstr. DecodeReg. FetchExecuteAddr. CalcALUReg FileMUX MUXDataMemoryMUXSignExtendBranch?RD RD RDWB DataRS2ImmMUXID/EXMEM/WBEX/MEMIF/IDAdderRS1Return PC(Addr + 4)ImmOpcodeDecodedCacheAddress•Increases clock cycle by no more than one MUX delay•Introduces structural hazard on write for JAL, howeverCS252/KubiatowiczLec 4.59/13/00Why not do this for branches?(original CRISP idea, applied to DLX)Internal Cache state:and r3,r1,r5addi r2,r3,#4sub r4,r2,r1bne r4,loopsubi r1,r1,#1A:sub r4,r2,r1addi r2,r3,#4sub r4,r2,r1---and r3,r1,r5subi r1,r1,#1NBnR4--NNA+16A+8---A+4A+20N/Aloop---N/AN/ANextBranchA+16:•Delay slot eliminated (good)•Branch has been “folded” into sub instruction (good).•Increases size of instruction cache (not so good)•Requires another read port in register file (BAD)•Potentially doubles clock period (Really BAD)CS252/KubiatowiczLec 4.69/13/00MemoryAccessWriteBackInstructionFetch“Instr. DecodeReg. Fetch”ExecuteAddr. CalcALUMUX MUXDataMemoryMUXSignExtendBranch?RD RD RDWB DataRS2ImmMUXID/EXMEM/WBEX/MEMIF/IDRS1Return PC(Addr + 4)DecodedCacheAddress•Might double clock period -- must access cache and reg•Could be better if had architecture with condition codesNext PCBranch PCReg File<BrRn>CS252/KubiatowiczLec 4.79/13/00Way of looking at timing:Instruction CacheAccessBranch RegisterLookupMuxRegister file access time might be close to original clock periodClock:Ready to latch new PCBeginning of IFetchCS252/KubiatowiczLec 4.89/13/00However, one could use the first technique to reflect PREDICTIONS and remove delay slots•This causes the next instruction to be immediately fetched from branch destination (predict taken)•If branch ends up being not taking, then squash destination instruction and restart pipeline at address A+16Internal Cache state:and r3,r1,r5addi r2,r3,#4sub r4,r2,r1bne r4,loopsubi r1,r1,#1A:sub r4,r2,r1addi r2,r3,#4sub r4,r2,r1bne loopand r3,r1,r5subi r1,r1,#1NNNNNA+12A+8loopA+4A+20NextA+16:CS252/KubiatowiczLec 4.99/13/00Book talks about R4000(taken from page 204)•On a taken branch, there is a one cycle delay slot, followed by two lost cycles (nullified insts).–Recall from prereq quiz: delay slot is an instruction-set feature! •On a non-taken branch, there is simply a delay slot (following two cycles not lost).•This is bad for loops. We could:–Predict taken and keep delay slot.–Use our pre-decoded cache technique to completely remove DSClock NumberI nstruction 1 2 3 4 5 6 7 8 9Branch inst I F I S RF EX DF DS TC WBDelay Slot I F I S RF EX DF DS TC WBBranch I nst+8 I F I S null null null null nullBranch I nst+12 I F null null null null nullBranch Targ I F I S RF EX DFCS252/KubiatowiczLec 4.109/13/00Exceptions and Interrupts(Hardware)CS252/KubiatowiczLec 4.119/13/00Example:


View Full Document

Berkeley COMPSCI 252 - Lecture 4 Control flow and interrupts Software Scheduling around hazards

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Lecture 4 Control flow and interrupts Software Scheduling around hazards
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 4 Control flow and interrupts Software Scheduling around hazards and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 4 Control flow and interrupts Software Scheduling around hazards 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?