CSU CS 553 - Predication and Speculation (13 pages)

Previewing pages 1, 2, 3, 4 of 13 page document View the full content.
View Full Document

Predication and Speculation



Previewing pages 1, 2, 3, 4 of actual document.

View the full content.
View Full Document
View Full Document

Predication and Speculation

158 views


Pages:
13
School:
Colorado State University- Fort Collins
Course:
Cs 553 - Programming Language Design and Implementation
Unformatted text preview:

Predication and Speculation Last time Instruction scheduling Profile guided optimizations How can we increase our scheduling window How can we move excepting instructions loads above splits Today Brief history of computer architecture Predication and speculation Compiling for IA 64 CS553 Lecture A B s1 C move code above a split Predication and Speculation 2 A Brief History of Computer Architecture The Early Years CISC Programmed by humans Feature bloat Provide many instructions Provide many addressing modes Variable length instructions Complex instructions VAX REMQHI EDITPC POLYF Problem Difficult to implement efficiently Difficult to pipeline Difficult to generate good code for CS553 Lecture Predication and Speculation 3 1 A Brief History of Computer Architecture cont The Early 1980s RISC Simplify the ISA to facilitate pipelining Uniform instruction format simplifies decoding Uniform instructions easier to pipeline Pipelining improves clock speeds Uniform ISA Simplifies Compilation Stanford Produce an architecture that leverages their strong compiler group Berkeley Produce an architecture that does not require heroic compilation Problems Uncertain latency No binary compatibility CS553 Lecture Predication and Speculation 4 A Brief History of Computer Architecture cont The 1990 s Dynamic Superscalar Simplified pipelining and more transistors enable hardware scheduling Re order instructions Hardware speculation branch prediction Increased issue width Note We re talking about implementation trends here not changes in the architecture Problems The bureaucracy problem More and more resources being devoted to control and management Fewer and fewer resources being devoted to actual work ILP limited typically between 1 and 2 CS553 Lecture Predication and Speculation 5 2 A Brief History of Computer Architecture cont The 1990 s CISC implemented on RISC core Provide binary compatibility Dynamically translate CISC instructions to RISC instructions Best of both worlds Note This again is a microarchitectural change not an architectural change Problems Hardware complexity Hardware still needs to discover parallelism Still have the n2 scheduling problem Still difficult to compile for CS553 Lecture Predication and Speculation 6 Implicitly Sequential Instruction Stream source code compiler machine code parallelized code hardware program FPU s Problems Compilers can expose parallelism Compilers must eventually emit linear code Hardware must then re analyze code to perform OoO execution Hardware loses information available to the compiler Compiler and hardware can only communicate through the sequential stream of instructions so hardware does redundant work How can we solve this problem CS553 Lecture Predication and Speculation 7 3 Explicitly Parallel Instruction Stream source code compiler parallel machine code hardware program parallelized code FPU s A solution Hardware does not need to re analyze code to detect dependences Hardware does not perform OoO execution VLIW Very Long Instruction Word Each instruction controls multiple functional units Each instruction is explicitly parallel CS553 Lecture Predication and Speculation 8 VLIW Basic idea Each instruction controls multiple functional units Rely on compilers to perform scheduling and to identify parallelism Simplified hardware implementations Benefits Compiler can look at a larger window of instructions than hardware Can improve the scheduler even after a chip has been fabricated Problems Slow compilation times No binary compatibility Difficult for compilers to deal with aliasing and long latencies Code is implementation specific CS553 Lecture Predication and Speculation 9 4 VLIW and IA 64 VLIW Big in the embedded market Binary compatibility is less of an issue An old idea Horizontal microcode Multiflow 1980 s Intel i860 early 1990 s Terminology EPIC Explicitly Parallel Instruction Computer New twist on VLIW Don t make code implementation specific IA 64 is Intel s EPIC instruction set Itanium is the first IA64 implementation CS553 Lecture Predication and Speculation 10 Explicitly Parallel Instruction Sets IA 64 IA 64 Design Philosophy Break the model of implicitly sequential execution Use template bits to specify instructions that can execute in parallel Issue these independent instructions to the FPU s in any order Templates will cause some increase in code size The hardware can then grab large chunks of instructions and simply feed them to the functional units Hardware does not spend a lot of time figuring out order of execution hence simplified hardware control Statically scheduled code Hardware can then provide a larger number of registers 128 about 4 times more than current microprocessors Number of registers fixed by the architecture but number of functional units is not CS553 Lecture Predication and Speculation 11 5 IA 64 A return to hardware simplicity Revisit the ideas of VLIW Simplify the hardware to make it faster Spend larger percentage of cycles doing actual work Spend larger percentage of hardware on registers caches and FPU s Use larger number of registers to support more parallelism Engineering goal parallel machine code Produce an inherently scalable architecture Design an architecture an ISA for which there can be many implementations This flexibility allows the implementation to change for years to come CS553 Lecture hardware program program Predication and Speculation 12 Two Key Performance Bottlenecks Branches Modern microprocessors perform good branch prediction But when they mispredict the penalty is high and getting higher Penalties increase as we increase pipeline depths Estimates 20 30 of performance goes to branch mispredictions Intel98 Branches also lead to small basic blocks which restrict latency hiding opportunities Memory latency CPU speed doubles every 18 months 60 annual increase Memory speed increase about 5 per year CS553 Lecture Predication and Speculation 13 6 Branches Limit Performance instr1 instr2 P1 P2 cmp r2 0 P2 jump else if instr3 instr4 jump Exit then Control dependences inhibit parallelism Don t know whether to execute instr3 or instr5 until the cmp is completed instr5 instr6 else instr7 CS553 Lecture Predication and Speculation 14 Predicated Execution if instr1 instr2 P1 P2 cmp r2 0 P2 jump else P1 instr3 then P1 instr4 jump Exit else P2 instr5 P2 instr6 instr7 This is called if conversion CS553 Lecture Idea Add a predicate flag to each instruction If predicate is true the instruction is


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Predication and Speculation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Predication and Speculation and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?