DOC PREVIEW
O-K-State ECEN 6253 - Lecture Notes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Reservation StationsData Flow LimitTomasulo Algorithm1. Reservation Stations. The single entry buffer at the head of each execution unit has been replaced by a multiple entry buffer...2. Common Data Bus. To approach the data flow limit, the instructions waiting in the reservation stations must get their input o...3. Register Tags. The CDB will have data from several different instructions at different times. The reservation stations must h...Dynamic Execution CoreReservation Station Implementation1. Allocate: the dispatch stage must write the instruction and its operands into an available reservation station entry.a. Find an entry in the reservation station with Busy = 0. To avoid repeatedly reading the Busy values, the Busy bits are hardwired to a priority encoder which selects the first unused entries as was done for the rename register file.b. Set Busy = 1. If operand values are available, copy them from the dispatch busses into the Operand 1 and Operand 2 fields and...2. Wait: the reservation stations must monitor the tag busses from the execution units to find matches for missing operands for each instruction in the reservation station.a. Tags (operands with Valid = 0) are compared with the tag busses.b. If a tag match occurs, the operands are copied from the corresponding forwarding bus and set the corresponding Valid = 1.c. If both Operands have Valid = 1, then set Ready = 1 (instruction wake up) indicating that the instruction is ready to be issued into the execution pipeline.3. Issue: the reservation station must choose (instruction select) which of the ready instructions to send (issue) to the execution pipeline.a. The ready instructions must be prioritized (for example, in program order or length of time in the reservation station). The ...b. Reset Busy = 0 to deallocate the reservation station entry so that it may be used by other instructions.ECEN 6253 Advanced Digital Computer Design Reservation Stations March 1, 2005 page 1 of 4Reservation StationsData Flow Limit. True data dependences (RAW hazards) are a fundamental limitation to the available ILP. There is no way to avoid stalling a “consumer” instruction waiting for an input operand until the “producer” instruction produces its output operand value. How-ever, it is not necessary to stall the entire pipeline as is done in scalar pipelines. If there are any instructions that already have input operand values available, they can pro-ceed to execute (possibly out of order). If there are enough machine resources (functional units and busses for input and output operands), execution should begin as fast as possible limited only by true data dependence. Such machines are called “data flow” machines and have long been a goal of parallel processing designers.Superscalar processors are not true data flow machines because only a limited number, s, of instructions are processed in parallel. The performance of a superscalar processor can approach (but not exceed) the performance of a data flow machine as long as s > ILP. This upper bound on superscalar performance is called the “data flow limit.”For example, consider the FFT code in fig. 5-18, p. 244. The data flow graph (DFG) of the machine code is shown in fig. 5-19, p. 245. The arcs correspond to true data depen-dences only (name dependences have been removed). The execution latencies are marked along the arcs. The data flow limit would then be the sum of the execution latencies along the longest path in the DFG.Note: an s = 6 superscalar could execute this DFG at the data flow limit if the execution of the i1 - i2 - i3 - i4 subgraph is delayed one level.Tomasulo Algorithm. A design for the floating point unit (FPU) of the IBM 360/91 done in the mid 1960’s has come to be known as the Tomasulo algorithm. The ability of the Tomasulo algorithm to handle data dependences has many features that are now used in superscalar designs.The original design for the IBM 360, fig. 5-20, p. 246, was not pipelined. It had two float-ing point execution units in parallel, one for add/subtract and the other for multiply/divide. Each execution unit had a buffer for the two source operands (the sink operand also served as the destination). There is also a floating point register file (FLR), a floating point instruction buffer (FLOS) and floating point load and store buffers (FLB and SDB).It soon became apparent that the performance of the floating point execution units could be significantly improved by pipelining. However, it was difficult to provide instructions and operands fast enough to keep the parallel pipelines busy. This is the classic problem of superscalar design.ECEN 6253 Advanced Digital Computer Design Reservation Stations March 1, 2005 page 2 of 4The Tomasulo algorithm modifies the original design to allow it operate closer to the data flow limit. As shown in fig. 5-21, p. 248, the FPU of the IBM 360/91 has three major innovations that are now used in modern superscalar designs.1. Reservation Stations. The single entry buffer at the head of each execution unit has been replaced by a multiple entry buffer (reservation stations). The extra entries allow the FLOS (corresponds to Dispatch in our modern machine) to avoid stalling if an instruction at the top of the execution unit must stall due to data dependence. As long as there is a reservation station available for the appropriate functional unit, the FLOS can keep dispatching instructions to the reservation stations. If the input operand val-ues are available, the instructions can be issued from the reservation station to the exe-cution pipelines. Otherwise, the instruction waits in the reservation station until the input operand values are available.2. Common Data Bus. To approach the data flow limit, the instructions waiting in the res-ervation stations must get their input operand values from the functional units as soon as they are available. The Common Data Bus (CDB) connects the functional unit out-put to the reservation stations as well as the floating point registers (FLR) and the store unit. In that way, the reservation stations can get data as soon as it is available without waiting for the data to go through the FLR. This is similar to the operand forwarding done in scalar pipelines. Conceptually, oper-and forwarding is simpler in the superscalar rather than in scalar processors since oper-and values are only forwarded from the end of the execution pipelines to the


View Full Document

O-K-State ECEN 6253 - Lecture Notes

Documents in this Course
Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?