DOC PREVIEW
Berkeley COMPSCI 252 - Midterm I

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1University of California, BerkeleyCollege of EngineeringComputer Science Division  EECSFall 1999 John KubiatowiczMidterm IOctober 13, 1999CS252 Graduate Computer ArchitectureYour Name:SID Number:Discussion Section:Problem Possible Score120220335425Total 1002[ This page left for π ]3.1415926535897932384626433832795028841971693993751058209749443Question #1: Short Answer1a) Give a simple definition of precise interrupts/exceptions:1b) Explain how the presence of delayed branches complicates the description of a preciseexception point (think about information that the operating system needs to continueexecution after an exception)?1c) The Alpha 21064 (first version of the Alpha processor) supported precise exceptions forvirtual memory but not for floating point operations. What sort of argument might be used tojustify this complicated combination of behaviors?1d) Explain the relationship between support for precise exceptions and support for branchprediction. What hardware structure supports both of these mechanisms in a modern out-of-order pipeline?4[ This page intentionally left blank!]51e) When is it better to handle events via interrupts rather than polling? How about the reverse?Be specific.1f) Name 3 different things that people try to predict in modern processors:1g) Why is branch prediction desirable?1h) Draw a simple diagram for each of the following branch predictors: GAg, PAg, PAp1i) Explain the difference between implicit and explicit register renaming as defined in class:6Problem #2: SuperpipeliningSuppose that we have single-issue, in-order pipeline with one fetch stage, one decode stage,multiple execution stages (which include memory access) and a singe write-back stage. Assumethat it has the following execution latencies (i.e. the number of stages that it takes to compute avalue): multf (5 cycles), addf (3 cycles), divf (2 cycles), integer ops (1 cycle). Assume fullbypassing and two cycles to perform memory accesses, i.e. loads and stores take a total of 3cycles to execute (with address computation). Finally, branch conditions are computed by theinteger execution unit..2a) Assume that this pipeline consists of a single linear sequence of stages in which later stagesserve as no-ops for shorter operations. Draw the pipeline by naming each of the stages.Describe what is computed in each stage and show all of the bypass paths (as arrows). Yourgoal is to design a pipeline which never stalls unless a value is not ready. Label each ofthese arrows with the types of instructions that will forward their results along these paths(i.e. use “M” for multf, “D” for divf, “A” for addf, “I” for integer operations). [Hint: becareful about store instructions!]2b) How many extra instructions are required between each of these instruction combinations toavoid stalls (i.e. assume that the second instruction uses a value from the first). Be careful!Between a divf and an store: Between a multf and an addf:Between a load and a multf: Between an addf and a divf:Between two integer instructions: Between an integer op and a store:72c) How many branch delay slots does this machine have? Explain.2d) Would it make sense to add register renaming hardware to this pipeline? Why or why not?2e) Could branch prediction increase performance of this pipeline? Why or why not?8Problem #3: Fixing the loopsFor this problem, assume that we have a superpipelined architecture like that in problem (2) withthe following use latencies (these are not the right answers for problem #2b!):Between a multf and an addf: 3 insts Between a load and a multf: 2 instsBetween an addf and a divf: 1 insts Between a divf and a store: 7 instsBetween an int op and a store: 0 insts Number of branch delay slots: 1 instsConsider the following loop which performs a restricted rotation and projection operation. Inthis code, F0 and F1 contain sin(θ) and cos(θ) for rotation. The array based at register r1contains pairs of single-precision (32-bit) values which represent x,y coordinates. The arraybased at register r2 receives a projected coordinate along the observer’s horizontal direction:project: ldf F3,0(r1)multf F10,F3,F0ldf F4,4(r1)multf F11,F4,F1addf F12,F10,F11divf F13,F12,F2stf 0(r2),F13addi r1,r1,#8addi r2,r2,#4subi r3,r3,#1bneq project nop3a) How many cycles does this loop take per iteration? Indicate stalls in the above code bylabeling each of them with a number of cycles of stall:3b) Reschedule this code to run with as few cycles per iteration as possible. Do not unroll it orsoftware pipeline it. How many cycles do you get per iteration of the loop now?93c) Unroll the loop once and schedule it to run with as few cycles as possible per iteration of theoriginal loop. How many cycles do you get per iteration now?3d) Your loop in (3c) will not run without stalls. Without going to the trouble to unroll further,what is the minimum number of times that you would have to unroll this loop to avoid stalls?How many cycles would you get per iteration then?3e) Software pipeline this loop to avoid stalls. Overlap 5 different iterations. What is theaverage number of cycles per iteration? Your code should have no more than one copy of theoriginal instructions. Ignore startup and exit code.10[ This page intentionally left blank!]113f) Assume that you have a Tomasulo architecture with functional units of the same executionlatency (number of cycles) as our deeply pipelined processor (be careful to adjust uselatencies to get number of execution cycles!). Assume that it issues one instruction per cycleand has an unpipelined divider with a small number of reservation stations. Suppose theother functional units are duplicated with many reservation stations and that there are manyCDBs. . What is the minimum number of divide reservation stations to achieve oneinstruction per cycle with the optimized code of (3b)? Show your work. [hint: assume thatthe maximum issue rate is sustained and look at the scheduling of a single iteration]12Question #4: Explicit Register Renaming and Precise InterruptsThis problem describes a machine that uses a scoreboard for scheduling in combination with areorder buffer and explicit register renaming. The constraints of this machine are as follows:• Only one instruction can issue per cycle• The Reorder buffer eight slots• There are 32 architectural registers and an explicit rename unit with 64 physical registers• There are 2 floating-point multiplier


View Full Document

Berkeley COMPSCI 252 - Midterm I

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Midterm I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Midterm I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Midterm I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?