Unformatted text preview:

G22.2243-001High Performance Computer ArchitectureLecture 13Case StudiesApril 19, 20064/24/2006 2Outline• Announcements– Lab Assignment 4 due today– Lab Assignments 2, and 3 graded– Final exam in 2 weeks: same time same place•Clustering• Putting It All Together – PowerPC 750 and 970– Intel P6 and Itanium[PowerPC 750 and PowerPC 970 User’s Manuals, P6 Family and Itanium Hardware Developer’s Manuals, and 64-Bit CPUs: What You Need to Know, extremetech.com ]4/24/2006 3Grading• Assignments: 70% (520 points)– Homework Assignments: 4 x 30 points each (120 points)– Lab Assignments: 4 x 100 points each (400 points)– During the last class, you will be given your homework and lab grades Make sure they are recorded correctly• Final exam: 30%– All lecture notes– Relevant chapters from the text – Homework assignments– Lab assignments4/24/2006 4PowerPC 750[PowerPC 750 User’s Manual]4/24/2006 5General• PowerPC 750 is an implementation of PowerPC microprocessor family of reduced instruction set computer (RISC) microprocessors• 750 implements the 32-bit portion of the PowerPC architecture• Provides 32-bit effective addresses– Integer data types of 8, 16, and 32 bits– Floating-point data types of 32 and 64 bits• High-performance, superscalar microprocessor– As many as four instructions can be fetched from the instruction cache per cycle– As many as two instructions can be dispatched per clock– As many as six instructions can execute per clock– Six independent execution units and two register files4/24/2006 6PowerPC Instructions• All PowerPC instructions are encoded as single-word (32-bit) • Instruction formats are consistent among all instruction types, permitting efficient decoding to occur in parallel with operand accesses. • This fixed instruction length and consistent format greatly simplifies instruction pipelining• Integer instructions– Integer arithmetic, compare, logical, rotate and shift instructions• Floating-point instructions– Floating-point arithmetic, multiply/add, rounding and conversion, compare, status and control instructions• Load/store instructions– Integer load and store instructions and Floating-point load and store– Primitives used to construct atomic memory operations (lwarx and stwcx. instructions)• Flow control instructions– branching, condition register logical, trap, and other instructions that affect the instruction flow• Processor control instructions: – These instructions are used for synchronizing memory accesses and management of caches, TLBs, and the segment registers.• Memory control instructions– These instructions provide control of caches, TLBs, and SRs.4/24/2006 7PowerPC 750 Microprocessor Block Diagram4/24/2006 8PowerPC 750 Microprocessor Block DiagramInstruction Cache (L1)Data Cache (L1)IFDISPATCHEXECOMReservationStationsRegisters &Rename BufferL2 CacheInterfaceBranch Processing4/24/2006 9Superscalar Pipeline4/24/2006 10Instruction Flow4/24/2006 11Fetch• Clock cycles necessary to request instructions from the memory system • Where exactly:1. the branch target instruction cache 2. the on-chip instruction L1 cache 3. the L2 cache4/24/2006 12Decode/Dispatch• The time it takes to fully decode the instruction and dispatch it from the instruction queue to the appropriate execution unit• Instruction dispatch requires the following:– Instructions can be dispatched only from the two lowest instruction queue entries, IQ0 and IQ1.– A maximum of two instructions can be dispatched per clock cycle (although an additional branch instruction can be handled by the BPU)– Only one instruction can be dispatched to each execution unit per clock cycle– There must be a vacancy in the specified execution unit.– A rename register must be available for each destination operand specified by the instruction– For an instruction to dispatch, the appropriate execution unit must be available– There must be an open position in the completion queue. If no entry is available, the instruction remains in the IQ.4/24/2006 13Execution Units• Two integer units (IUs) that share thirty-two GPRs for integer operands– IU1 can execute any integer instruction– IU2 can execute all integer instructions except multiply and divide– Single-entry reservation station for each• One three-stage floating point unit (FPU)– both single- and double-precision operations– Hardware support for denormalized numbers– Single-entry reservation station– Thirty-two 64-bit FPRs for single- or double-precision operands4/24/2006 14Execution Units (Cont’d)•Two-stage LSU– Two-entry reservation station– Single-cycle, pipelined cache access– Dedicated adder performs EA calculations– Performs alignment and precision conversion for floating-point data– Performs alignment and sign extension for integer data– Three-entry store queue– Supports both big- and little-endian modes• SRU handles miscellaneous instructions– Executes CR logical and Move to/from SPR instructions (mtspr and mfspr)– Single-entry reservation station4/24/2006 15Completion • Completion unit retires an instruction from the six-entry reorder buffer (completion queue) when 1. All instructions ahead of it have been completed, and 2. The instruction has finished execution, and 3. No exceptions are pending• Guarantees sequential programming model (precise exception model)• Monitors all dispatched instructions and retires them in order• Tracks unresolved branches and flushes instructions from the mispredicted branch• Retires as many as two instructions per clock4/24/2006 16Pipeline Stages4/24/2006 17Rename Buffers• 750 provides rename registers for holding instruction results before the completion commits them to the architected register• There are six GPR rename registers, six FPR rename registers, and one each for the CR, LR, and CTR• When an instruction is dispatched to its execution unit, a rename register for the results of that instruction is assigned• Dispatcher also provides a tag to the execution unit identifying the rename register that forwards the required data for an instruction • When the source data reaches the rename register, execution can begin• Results are transferred from the rename registers to the architected registers by the completion unit when an instruction is retired from completion queue• Results of squashed instructions are flushed from the rename


View Full Document

NYU CSCI-GA 2243 - Case Studies

Download Case Studies
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Case Studies and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Case Studies 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?