NYU CSCI-GA 2243 - CSCI-GA 2243 assignment 3 - D2843929

Home> Schools> New York University> Computer Science (CSCI-GA) > CSCI-GA 2243> CSCI-GA 2243 assignment 3

DOC PREVIEW

NYU CSCI-GA 2243 - CSCI-GA 2243 assignment 3

School name New York University

Course Csci-Ga 2243- High Performance Computer Arch

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

G22.2243 Spring 2006 1 G22.2243: High Performance Computer Architecture SimpleScalar Assignment #3 (Due: March 29, 2006) In Assignments 3 and 4, you will build a dynamically-scheduled multiple-issue processor. This assignment starts you off towards this goal by building the basic components in the context of a single-issue, in-order processor with multiple functional units. These components will be extended with dynamic scheduling and multiple issue in Assignment 4. Background For this exercise, we will use a different processor core architecture than we have been using in Assignments 1 and 2. The architecture is similar to that introduced in class for implementing Tomasulo-like algorithms for dynamic scheduling. The processor core consists of instruction scheduling logic interacting with multiple functional units (FUs) and the register set using a register update unit (RUU), which acts as a combination reservation station store and reorder buffer) and a load/store queue (LSQ). The instruction scheduling logic consists of the following stages:  Instruction Fetch: This stage fetches instructions and puts them into a fetch/dispatch queue. For this assignment, we shall assume that we can fetch one instruction/cycle.  Dispatch: This stage retrieves an instruction from the fetch/dispatch queue, allocates an entry in the RUU (and the LSQ if required), checks for data hazards and if none exist, then inserts the instruction into a ready queue. In this assignment, we require that all operands of an instruction be ready before it is placed in the ready queue. This implies both that output registers are available for writing, and that input registers are available for reading. The processor keeps track of registers that are destinations for instructions under execution using a create vector structure, which identifies the RUU entry that will be writing into it. Special handling is required for load/store instructions as discussed in additional detail below.  Issue: This stage examines the instruction at the head of the ready queue, and if functional unit resources are available, then issues the instruction to the appropriate FU. Each FU is modeled in terms of two parameters: an issue latency and an operation latency. The former determines how frequently instructions can be sent to the FU, while the latter indicates when operation results will become available. Special handling is required for store instructions as discussed below.  Writeback: This stage waits for an instruction to finish execution and updates the register set with the result. Instructions in the dispatch stage, waiting for this result can be made ready at the end of the same cycle.  Commit: This stage is responsible for “retiring” instructions at the head of the RUU by committing the results of the instruction to architected processor state. The RUU (and optionally the LSQ) entry is freed up. Special handling is required for store instructions as discussed below. In this assignment, you will simulate the behavior of the above processor core using the SimpleScalar toolkit. The dispatch stage of the core is responsible for functional simulation of instructions; the rest of the stages just simulate the instruction flow through different parts of the core. Assignment 4 works with the same basic structure as above, but allows the fetching, dispatching, issue, writeback, and committing of multiple instructions every cycle. Furthermore, the instruction dispatch logic is altered to implement dynamic scheduling.G22.2243 Spring 2006 2 For simplicity, we shall assume the existence of a • Perfect instruction cache (no I-cache misses) • Perfect data cache (no D-cache misses) • Perfect branch predictor, implemented as discussed below. The Assignment The assignment requires you to provide simulator code for implementing the five instruction scheduling stages described above, so as to realize a single-issue, in-order processor with multiple functional units. Since the functional units have different issue and operation latencies, instructions may end up producing results out-of-order (even when issued to the FUs in order); however, the commit stage ensures that instructions finish execution in order. To help you get started, I have provided a sketch of the simulator, sim-multfu.c, which defines the various structures, their sizes, and functional unit configurations. You will need to update the makefile rules as in the previous assignments to make this file part of the simplesim-3.0 sources (follow the pattern of rules for the sim-outoforder simulator). Implementation notes: 1. Implementing perfect branch prediction: As in Assignments 1 and 2, the instruction fetch stage works with a variable, fetch_pc, which represents its knowledge of the next address from which instructions need to be fetched. Perfect branch prediction can be implemented simply by updating this variable with the correct PC value obtained by functionally simulating the instruction. Since the execution of the dispatch stage precedes that of fetch, this update has the effect of ensuring that no slots are lost because of mispredictions. 2. Interacting with functional units: To help emulate functional units and the fact that they can be configured with different issue and operational latencies, we rely on a SimpleScalar module, resource, which models such units. The API for this module provides functions to create a pool of FUs, to allocate a FU as required, and to keep track of when this FU will next become available for issue. The supplied skeleton code in sim-multfu.c includes code fragments showing how to create, initialize, and interact with the resource module. Additional details can be found in the sim-outoforder sources. To model execution latency in a FU, we rely on an event queue structure. This structure allows the enqueuing of “result” events, which become ready a specified number of cycles in the future. The simulator uses this functionality to indicate when an instruction should be deemed to enter its writeback stage. 3. Dealing with load/store instructions in the processor core: Memory instructions are handled somewhat differently as described below. A load or a store instruction requires two operations in the processor core: an effective address computation (executed by the Integer ALU) and the memory operation (which requires use of Read or Write ports). To support dynamic memory

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

NYU CSCI-GA 2243 - CSCI-GA 2243 assignment 3

Sign up for free to view:

Please select your school