Greg Steffan Multithreaded Machines Are Everywhere A Scalable Approach to Threads ThreadThread Level Speculation J Gregory Steffan Steffan Christopher B Colohan Colohan Antonia Zhai Zhai and Todd C Mowry P P P C C C C Computer Science Department C P P C C C C Shared Memory SUN MAJC ALPHA 21464 Dual Pentium IBM Power4 Carnegie Mellon University How can we use them Appeared in ISCA 2000 Carnegie Mellon A Scalable Approach to Thread Level Speculation 1 A Scalable Approach to Thread Level Speculation C Shared Memory SGI Origin Parallelism 2 Time Steffan complex data structures while pointers pointers pointers hash 19 hash 21 x hash index1 hash index2 y runrun time inputs How can we make the compiler compiler s job feasible hash 33 hash 30 Thread Level Speculation TLS Processor hash 3 hash 10 complex control flow hash 10 hash 25 Carnegie Mellon Architectural Support for TLDS on a SMT Processor C Example Proving independence of threads is hard 3 P C Carnegie Mellon Steffan Automatic Parallelization A Scalable Approach to Thread Level Speculation P C Carnegie Mellon Steffan A Scalable Approach to Thread Level Speculation 4 Steffan 1 Greg Steffan Example of ThreadThread Level Speculation Processor Processor Processor Example of ThreadThread Level Speculation Processor Processor Epoch 1 hash 3 hash 10 Time Epoch 1 hash 3 hash 10 Epoch 2 hash 19 hash 21 Epoch 2 hash 19 Violation hash 21 Time Epoch 3 hash 33 hash 30 Processor Processor Epoch 3 hash 33 hash 30 5 Epoch 1 Time hash 3 hash 10 commit 9 Processor Carnegie Mellon Steffan Epoch 2 hash 19 Violation hash 21 commit 9 Processor Epoch 3 hash 33 hash 30 commit 9 hash 10 hash 25 hash 10 hash 25 A Scalable Approach to Thread Level Speculation Example of ThreadThread Level Speculation Processor Epoch 4 Epoch 4 Carnegie Mellon A Scalable Approach to Thread Level Speculation Processor 6 Steffan Example of ThreadThread Level Speculation Processor Processor Epoch 1 Epoch 4 hash 10 hash 25 commit 8 Time hash 3 hash 10 commit 9 Processor Epoch 2 hash 19 Violation hash 21 commit 9 Processor Epoch 3 hash 33 hash 30 commit 9 Processor Epoch 4 hash 10 hash 25 commit 8 Epoch 4 Retry hash 10 hash 25 commit 9 Carnegie Mellon A Scalable Approach to Thread Level Speculation 7 Architectural Support for TLDS on a SMT Processor Carnegie Mellon Steffan A Scalable Approach to Thread Level Speculation 8 Steffan 2 Greg Steffan Goals of Our Approach Overview of Our Approach 1 Handle arbitrary memory accesses System requirements i e not just array references 1 Detect data dependence violations 2 Preserve performance of nonnon speculative workloads keep hardware support minimal and simple extend invalidationinvalidation based cache coherence 2 Buffer speculative modifications use the caches as speculative buffers 3 Apply to any scale of multithreaded architecture CMPs CMPs SMT processors more traditional MPs coherence already works at a variety of scales effective simple and scalable TLS hence our scheme is also scalable Carnegie Mellon A Scalable Approach to Thread Level Speculation 9 Carnegie Mellon Steffan 10 A Scalable Approach to Thread Level Speculation Related Schemes Outline Details of our Approach Wisconsin Multiscalar Multiscalar Trace Processor Stanford Hydra life cycle of an epoch U P Catalunya Speculative Multithreading speculative coherence what happens at commit time Intel U Portland Dynamic Multithreading forwarding data between epochs Illinois at U C I I ACOMA Performance our approach seamlessly scales both up and down Conclusions Carnegie Mellon A Scalable Approach to Thread Level Speculation Steffan 11 Architectural Support for TLDS on a SMT Processor Carnegie Mellon Steffan A Scalable Approach to Thread Level Speculation 12 Steffan 3 Greg Steffan Life Cycle of an Epoch Time Spawned Epoch Numbers Slow Commit Thread Identifier TID Sequence Number Init 9 Becomes Speculative 9 Speculative Work Represent a partial ordering 9 Commit Wait to be Homefree 9 Fast Commit Complete Pass Homefree signedsigned compare sequence numbers if TIDs match allows for wrapwrap around otherwise the epochs are unordered 9 from independent programs 9 9 9 from independent chains of speculation within one program Carnegie Mellon A Scalable Approach to Thread Level Speculation 13 Carnegie Mellon Steffan Speculative Thread Model A Scalable Approach to Thread Level Speculation 14 Preserving Correctness RoundRound robin schedule of epochs to processors Speculation must fail whenever speculative state is lost not a requirement of our scheme just for convenience eg eg replacement of a speculative line ORB overflow Each epoch spawns the next Any exceptions are suppressed until epoch is homefree through a lightweight fork instruction 10 cycles eg eg divide by zero segfault Violations detected through polling Polling violation detection must avoid infinite looping each epoch runs to completion before detecting failed speculation and restarting requires a poll inside each loop No system calls while speculative for now Violation chaining if an epoch suffers a violation we squash all logicallylogically later epochs many possibilities to be evaluated in future work ensures original sequential semantics are preserved Carnegie Mellon A Scalable Approach to Thread Level Speculation Steffan 15 Architectural Support for TLDS on a SMT Processor Carnegie Mellon Steffan A Scalable Approach to Thread Level Speculation 16 Steffan 4 Greg Steffan Life Cycle of an Epoch MESI Coherence Example Spawned Time Thread A Becomes Speculative Thread B Processor Cache State Speculative Coherence Invalid Processor Cache State Tag Data Tag Data Invalid Commit Mechanisms to Squash or Commit Shared Memory X 2 Complete Pass Homefree Carnegie Mellon 17 A Scalable Approach to Thread Level Speculation Carnegie Mellon Steffan Thread B Processor Cache State Invalid Thread A Processor Tag Data Thread B Processor Cache State Tag Data Invalid Invalid Processor Tag Data Load X Cache State Tag Data Excl Read X 2 Read Fill Shared Memory X 2 Shared Memory X 2 Carnegie Mellon A Scalable Approach to Thread Level Speculation Steffan MESI Coherence Example Load X Cache State 18 A Scalable Approach to Thread Level Speculation MESI Coherence Example Thread A 19 Architectural Support for TLDS on a SMT Processor Carnegie Mellon Steffan A Scalable Approach to Thread Level Speculation 20 Steffan 5 Greg Steffan MESI Coherence Example Thread A Store X 3 Thread B Processor Cache State Load X Thread A Processor Cache State Tag Data
View Full Document
Unlocking...