DOC PREVIEW
Berkeley COMPSCI 252 - Midterm

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1University of California, Berkeley College of Engineering Computer Science Division  EECS Fall 2000 John KubiatowiczMidterm I October 18, 2000 CS252 Graduate Computer Architecture Your Name: SID Number: Problem Possible Score 1 40 2 35 3 30 Total 1002 [ This page left for π ] 3.1415926535897932384626433832795028841971693993751058209749443Question #1: Short Answer 1a) What are precise exceptions and why are they desirable? Give 3 reasons. 1b) What hardware structure can be used to support branch prediction, data prediction, and precise exceptions in an out-of-order processor? Explain what this structure is (including what information it holds) and how it is used with implicit register renaming to recover from a bad prediction or exception. 1c) Do you need the structure of (1b) in order to achieve precise exceptions in an in-order pipeline? Why or why not? Explain.41d) Given that enough transistors were available, would it make sense to design a 64-way superscalar processor? Give 3 reasons why or why not. 1e) Name two components of a modern superscalar architecture whose delay scales quadratically with the square of the issue-width. Could you pinpoint a single root cause for these quadratic trends? 1f) You read a paper on Simultaneous Multithreading. Can you give a short description on the basic idea and why it is desirable? What metric is a Simultaneous Multithreading processor attempting to optimize over a basic out-of-order processor?51g) What is memory disambiguation? Describe the minimal hardware support required to perform conservative memory disambiguation in an out-or-order processor (which means that you never send a load to the memory system unless you know that you should) and list pseudo-code that is followed when dispatching loads and/or stores. 1h) Why might we want to do a bit of guessing about load dependencies instead of adhering to the strict algorithm that you gave in (1g)? 1i) Why does prediction work? 1j) What is the problem with aliasing in branch predictors? Draw and name one of the simplest branch predictors that helps to avoid aliasing and describe why it is less sensitive to aliasing than other predictors.6Problem #2: Two-way superscalar processors Consider a dual-issue, in-order pipeline with one fetch stage, one decode stage, multiple execution stages (which include memory access) and a single write-back stage. Assume that the execution stages are organized into two parallel execution pipelines (call them even and odd) that support all possible simultaneous combinations of two instructions. Instructions wait in the decode stage until all of their dependencies have been satisfied. Further, since this is an in-order pipeline, new instructions will be forced to wait behind stalled instructions. On each cycle, the decode stage takes zero, one, or two ready instructions from the fetch stage, gathers operands from the register file or the forwarding network, then dispatch them to execution stages. If less than 2 instructions are dispatched on a particular cycle, then “NOPs” are sent to the execution stages. When two instructions are dispatched, the even pipeline receives the earlier instruction. When only one instruction is dispatched, it is placed in the even pipeline. Assume that each of the execution pipelines consist of a single linear sequence of stages in which later stages serve as no-ops for shorter operations (or: every instruction takes the same number of stages to “execute”, but results of shorter operations are available for forwarding sooner). All operations are fully pipelined and results are forwarded as soon as they are complete. Assume that the execution pipelines have the following execution latencies: addf (2 cycles), multf (3 cycles), divf (4 cycles), integer ops (1 cycle). Assume that memory instructions take 3 cycles of execution: one for address calculation – done by the integer execution stage, and two unbreakable cycles for the actual memory access. Finally, assume that branch-conditions are computed by integer execution units. 2a) Explain why we would be unable to pick a single optimum number of branch delay slots for the above processor. 2b) Can we tell the programmer that the number of branch delay slots varies by circumstances? If so, explain the programmer specification for branches. If not, explain why not and (1) indicate how we would “fix” the hardware to have only a specific number of branch delay slots and (2) indicate what that number would be.72c) Does this processor have WAW hazards? Explain. If “yes”, give an efficient way to fix the problem. 2d) Does this processor have WAR hazards? Explain. If “yes”, give an efficient way to fix the problem. 2e) Assume that the fetch unit presents instructions to the decode stage for execution. The decode stage is free to dispatch zero, one, or two instructions every cycle. Once instructions have passed decode, they execute to completion (no further blocking). Assume that enough bypassing hardware has been included to handle every arrow given in (2g). Suppose that we have the following instruction sequence: ld r1, 0(r2) add r4, r1, r2 How many cycles must be inserted between these two instructions by the decode stage to ensure correct execution? How does this translate to user-visible load-delay slots? Explain. 2f) Suppose that we have the following instruction sequence: multf f1, f2, f3 st 0(r1), f1 How many cycles will be inserted between these two instructions by the decode stage? How many lost instructions does this represent?82g) Draw a simple diagram for the pipelines of this processor. Draw pipeline stages as boxes with letters inside: Use “F” for the fetch stage, “D” for the decode stage, EX1 through EX4 for the execution stages of each of the pipelines (including memory accesses), and “W” for writeback. Draw simple forward arrows to indicate the flow of information from stage to stage. Clearly label which pipeline is the even pipeline. Finally, describe what is computed in each stage and show all of the bypass paths (as arrows). Your goal is to design a pipeline that never stalls unless a value is not ready. Label each of the bypass arrows with the types of instructions that will forward their results along that path (i.e. use “M” for


View Full Document

Berkeley COMPSCI 252 - Midterm

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Midterm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Midterm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Midterm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?