1Thread-Level SpeculationJames BruceCraig SoulesMotivationzTraditional processors–Aggressively out of order with respect to ALU opsz(Relatively) easy to find dependencies–Much less aggressive with respect to memory opszRAW – A “true” dependency.zHard to know when loads and stores overlapzMost dependencies are dynamic and zC language (up to C99 at least) doesn’t help any2Thread-Level SpeculationzIt can pay off to be optimistic–Works since loads and stores usually don’t overlap–Can be aggressively parallel if:zWe can detect when RAW does happenzWe have a way of fixing up the problem to regain sequential behaviorzThread-Level Speculation–Aggressively parallelize potentially dependent code–Multiple threads created out of a sequential program–Granularity between OOP and traditional threadsTLS: Example3Hydra DesignzHydra–From “Data Speculation Support for a Chip Multiprocessor”–Hydra is a 4-way Chip Multiprocessor (CMP)–Each chip is a simple pipelined MIPS proccessor–Some extra Logic and storage to support TLS–Complicated parts done in software through vectored exceptions Hydra Design4Hydra: Hardware DesignzEach Core processor is essentially unchanged–But there are 4 on one die–Can run separate processes, or TLS cooperation–Inter-proc messages via special memory locationszL1 cache is write through, also extra state bitszRead and Write buses with snoopingzEach thread/proc gets a write buffer near L2zL2 only updated by non-speculative threadsHydra: L1 Cache tagszModified–Indicates this is a local copy (so no dependencies)zPre-Invalidate–Written by more speculative thread (clear after current thread is done executing)zRead-Bits (per byte)–This processor’s thread read, and thus depends on this cache linezWrite-Bits (per byte)–Written so this value is a local copy (avoids false dependencies)5Hydra: System DesignzSubroutine speculation–Speculate on code after subroutine–Guess return value (96% accurate)–Track history on whether function is predictable–Added via code translation to medium size subroutineszLoop speculation–Each proc gets an iterationzOne thread is always running non-speculativelyHydra: System DesignzHW does not support register passingzState stored in Register Pass BufferszOn a hazard, update value, clear write buffer, and start again from RPBzWhen a speculative thread executes cleanly, it then waits to become the head processor to commit its valueszWhen a non-speculative thread finishes, it turns over control to the next-least speculative thread6Hydra: Subroutine SpeculationHydra: Loop Speculation7Hydra: ResultszNot as impressive as one might hopeHydra: Results8Scalable TLSzWant to scale up to big MP machineszWant to scale down to single chipszCan’t rely on hardware design too muchzUse a speculative cache coherence schemeCache CoherencezNew states:–Speculative-exclusive (SpE)–Speculative-shared (SpS)zNew messages:–Read-Exclusive-Speculative–Upgrade-Speculative–Invalidate-SpeculativezSpeculation doesn’t affect other caches9Processor EventsExternal Events10When Speculation Fails…zSpeculatively modified lines are invalidatedzSpeculatively loaded lines are transitioned–Speculative-Exclusive → Exclusive–Speculative-Shared → SharedProviding MultiprogrammingzA chip has multiple “Speculative Contexts”–Epoch number–Violation flag–Speculatively loaded/modified cache line flags11Multiple WriterszCould support multiple writers–Different epochs each writing to the same linezFine-grained “Speculatively Modified” bits–One bit for each word, or byte–Merge using the modified word of latest epochLatency Scaling12Processor ScalingProcessor
View Full Document