DOC PREVIEW
CSU CS 553 - Predication and Speculation

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS553 Lecture Predication and Speculation 2Predication and Speculation Last time– Instruction scheduling– Profile-guided optimizations– How can we increase our scheduling window?– How can we move excepting instructions (loads) above splits? Today– Brief history of computer architecture– Predication and speculation– Compiling for IA-64BACs1move code above a splitCS553 Lecture Predication and Speculation 3A Brief History of Computer Architecture The Early Years: CISC– Programmed by humans– Feature bloat:– Provide many instructions– Provide many addressing modes– Variable length instructions– Complex instructions– VAX: REMQHI, EDITPC, POLYF Problem– Difficult to implement efficiently– Difficult to pipeline– Difficult to generate good code for2CS553 Lecture Predication and Speculation 4A Brief History of Computer Architecture (cont) The Early 1980s: RISC– Simplify the ISA to facilitate pipelining– Uniform instruction format simplifies decoding– Uniform instructions easier to pipeline– Pipelining improves clock speeds Uniform ISA Simplifies Compilation– Stanford: Produce an architecture that leverages their strong compilergroup– Berkeley: Produce an architecture that does not require heroic compilation Problems– Uncertain latency– No binary compatibilityCS553 Lecture Predication and Speculation 5A Brief History of Computer Architecture (cont) The 1990’s: Dynamic Superscalar– Simplified pipelining and more transistors enable hardware scheduling– Re-order instructions– Hardware speculation (branch prediction)– Increased issue width Note– We’re talking about implementation trends here, not changes in thearchitecture Problems– The bureaucracy problem– More and more resources being devoted to control and management– Fewer and fewer resources being devoted to actual work– ILP limited (typically between 1 and 2)3CS553 Lecture Predication and Speculation 6A Brief History of Computer Architecture (cont) The 1990’s: CISC implemented on RISC core– Provide binary compatibility– Dynamically translate CISC instructions to RISC instructions– Best of both worlds? Note– This again is a microarchitectural change, not an architectural change Problems– Hardware complexity– Hardware still needs to discover parallelism– Still have the n2 scheduling problem– Still difficult to compile forCS553 Lecture Predication and Speculation 7Implicitly Sequential Instruction Stream Problems– Compilers can expose parallelism– Compilers must eventually emit linear code– Hardware must then re-analyze code to perform OoO execution– Hardware loses information available to the compiler– Compiler and hardware can only communicate through the sequentialstream of instructions, so hardware does redundant work How can we solve this problem? source codecompilerparallelizedcode machine codehardwareprogramFPU’s4CS553 Lecture Predication and Speculation 8Explicitly Parallel Instruction Stream A solution– Hardware does not need to re-analyze code to detect dependences– Hardware does not perform OoO execution VLIW: Very Long Instruction Word– Each instruction controls multiple functional units– Each instruction is explicitly parallel source codecompilerparallelizedcode parallel machine codehardwareprogramFPU’sCS553 Lecture Predication and Speculation 9VLIWBasic idea– Each instruction controls multiple functional units– Rely on compilers to perform scheduling and to identify parallelism– Simplified hardware implementations Benefits– Compiler can look at a larger window of instructions than hardware– Can improve the scheduler even after a chip has been fabricated Problems– Slow compilation times– No binary compatibility– Difficult for compilers to deal with aliasing and long latencies– Code is implementation-specific5CS553 Lecture Predication and Speculation 10VLIW and IA-64 VLIW– Big in the embedded market– Binary compatibility is less of an issue– An old idea– Horizontal microcode– Multiflow (1980’s)– Intel i860 (early 1990’s) Terminology– EPIC: Explicitly Parallel Instruction Computer– New twist on VLIW– Don’t make code implementation-specific– IA-64 is Intel’s EPIC instruction set– Itanium is the first IA64 implementationCS553 Lecture Predication and Speculation 11Explicitly Parallel Instruction Sets: IA-64 IA-64 Design Philosophy– Break the model of implicitly sequential execution– Use template bits to specify instructions that can execute in parallel– Issue these independent instructions to the FPU’s in any order– (Templates will cause some increase in code size)– The hardware can then grab large chunks of instructions and simply feedthem to the functional units– Hardware does not spend a lot of time figuring out order ofexecution; hence, simplified hardware control– Statically scheduled code– Hardware can then provide a larger number of registers– 128 (about 4 times more than current microprocessors)– Number of registers fixed by the architecture, but number offunctional units is not6CS553 Lecture Predication and Speculation 12IA-64 A return to hardware “simplicity”– Revisit the ideas of VLIW– Simplify the hardware to make it faster– Spend larger percentage of cycles doing actual work– Spend larger percentage of hardware on registers, caches, and FPU’s– Use larger number of registers to support more parallelism Engineering goal– Produce an “inherently scalablearchitecture”– Design an architecture―anISA―for which there can bemany implementations– This flexibility allows the implementationto change for “years to come” parallel machine codehardwareprogramprogramCS553 Lecture Predication and Speculation 13Two Key Performance Bottlenecks Branches– Modern microprocessors perform good branch prediction– But when they mispredict, the penalty is high and getting higher– Penalties increase as we increase pipeline depths– Estimates: 20-30% of performance goes to branch mispredictions [Intel98]– Branches also lead to small basic blocks, which restrict latency hidingopportunities Memory latency– CPU speed doubles every 18 months (60% annual increase)– Memory speed increase about 5% per year7CS553 Lecture Predication and Speculation 14− Control dependences inhibit parallelism− Don’t know whether to executeinstr3 or instr5 until the cmp iscompleted instr1 instr2 . . . P1,P2 ← cmp(r2,0)(P2)jump else


View Full Document

CSU CS 553 - Predication and Speculation

Download Predication and Speculation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Predication and Speculation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Predication and Speculation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?