Greg SteffanArchitectural Support for TLDS on a SMT Processor 11A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonA Scalable Approach to A Scalable Approach to ThreadThread--Level SpeculationLevel SpeculationJ. Gregory J. Gregory SteffanSteffan, Christopher B. , Christopher B. ColohanColohan, , Antonia Antonia ZhaiZhai, and Todd C. , and Todd C. MowryMowryComputer Science DepartmentComputer Science DepartmentCarnegie Mellon UniversityCarnegie Mellon University(Appeared in ISCA 2000.)(Appeared in ISCA 2000.)2A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonMultithreaded Machines Are EverywhereMultithreaded Machines Are Everywhere)How can we use them? Parallelism!CPCCPCCPCShared MemorySUN MAJC,IBM Power4ALPHA 21464 Dual Pentium SGI OriginThreadsCPCCPCShared MemoryCCPCP3A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonAutomatic ParallelizationAutomatic ParallelizationProving independence of threads is hard:Proving independence of threads is hard:––complex control flowcomplex control flow––complex data structurescomplex data structures––pointers, pointers, pointerspointers, pointers, pointers––runrun--time inputstime inputsHow can we make the compilerHow can we make the compiler’’s job feasible?s job feasible?)Thread-Level Speculation (TLS)4A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonExampleExamplewhile (...){x = hash[index1];…hash[index2] = y;...}Time= hash[3]…hash[10] =…Processor= hash[19]…hash[21] =…= hash[33]…hash[30] =…= hash[10]…hash[25] =…Greg SteffanArchitectural Support for TLDS on a SMT Processor 25A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonExample of ThreadExample of Thread--Level SpeculationLevel SpeculationTime= hash[3]…hash[10] =…Epoch 1= hash[19]…hash[21] =…Epoch 2= hash[33]…hash[30] =…Epoch 3= hash[10]…hash[25] =…Epoch 4Processor ProcessorProcessorProcessor6A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonExample of ThreadExample of Thread--Level SpeculationLevel SpeculationTime= hash[3]…hash[10] =…Epoch 1= hash[19]…hash[21] =…Epoch 2= hash[33]…hash[30] =…Epoch 3= hash[10]…hash[25] =…Epoch 4Processor ProcessorProcessorProcessorViolation!7A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonExample of ThreadExample of Thread--Level SpeculationLevel SpeculationTime= hash[3]…hash[10] =…commit?Epoch 1= hash[19]…hash[21] =…commit?Epoch 2= hash[33]…hash[30] =…commit?Epoch 3= hash[10]…hash[25] =…commit?Epoch 4Processor ProcessorProcessorProcessorViolation!99988A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonExample of ThreadExample of Thread--Level SpeculationLevel SpeculationTime= hash[3]…hash[10] =…commit?Epoch 1= hash[19]…hash[21] =…commit?Epoch 2= hash[33]…hash[30] =…commit?Epoch 3= hash[10]…hash[25] =…commit?Epoch 4Processor ProcessorProcessorProcessorViolation!9998= hash[10]…hash[25] =…commit?Epoch 4Retry9Greg SteffanArchitectural Support for TLDS on a SMT Processor 39A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonGoals of Our ApproachGoals of Our Approach1) Handle arbitrary memory accesses1) Handle arbitrary memory accesses––i.e. not just array referencesi.e. not just array references2) Preserve performance of non2) Preserve performance of non--speculative workloadsspeculative workloads––keep hardware support minimal and simplekeep hardware support minimal and simple3) Apply to any scale of multithreaded architecture3) Apply to any scale of multithreaded architecture––CMPsCMPs, SMT processors, more traditional MPs, SMT processors, more traditional MPs)effective, simple, and scalable TLS10A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonOverview of Our ApproachOverview of Our ApproachSystem requirements:System requirements:1) Detect data dependence violations1) Detect data dependence violations••extend invalidationextend invalidation--based cache coherencebased cache coherence2) Buffer speculative modifications2) Buffer speculative modifications••use the caches as speculative buffersuse the caches as speculative buffers)coherence already works at a variety of scales)hence our scheme is also scalable11A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonRelated SchemesRelated Schemes••Wisconsin (Wisconsin (MultiscalarMultiscalar, Trace Processor), Trace Processor)••Stanford (Hydra)Stanford (Hydra)••U.P. U.P. CatalunyaCatalunya(Speculative Multithreading)(Speculative Multithreading)••Intel/U. Portland (Dynamic Multithreading)Intel/U. Portland (Dynamic Multithreading)••Illinois at U.C. (IIllinois at U.C. (I--ACOMA)ACOMA))our approach seamlessly scales both up and down12A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonOutlineOutline)Details of our ApproachDetails of our Approach––life cycle of an epochlife cycle of an epoch––speculative coherence speculative coherence ––what happens at commit timewhat happens at commit time––forwarding data between epochsforwarding data between epochs••PerformancePerformance••ConclusionsConclusionsGreg SteffanArchitectural Support for TLDS on a SMT Processor 413A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonLife Cycle of an EpochLife Cycle of an EpochSpawnedBecomesSpeculativeCommit?InitSpeculativeWorkWait to beHomefree?Slow Commit:Fast Commit:Complete,Pass HomefreeTime9999999914A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonEpoch NumbersEpoch NumbersRepresent a partial orderingRepresent a partial ordering––signedsigned--compare sequence numbers if compare sequence numbers if TIDsTIDsmatchmatch••allows for wrapallows for wrap--aroundaround––otherwise the epochs are unorderedotherwise the epochs are unordered••from independent programs from independent programs ••from independent chains of speculation within one from independent chains of speculation within one programprogramThread Identifier (TID) Sequence Number15A Scalable Approach to Thread-Level Speculation SteffanCarnegie MellonSpeculative Thread ModelSpeculative Thread ModelRoundRound--robin schedule of epochs to processorsrobin schedule of epochs to processors––not a requirement of our scheme, just for conveniencenot a requirement of our scheme, just for convenienceEach epoch spawns the next
View Full Document