Unformatted text preview:

EE392C: Advanced Topics in Computer Architecture Lecture #4Polymorphic ProcessorsStanford University Thursday, April 17 2003Speculative Multithreadin gLecture #4: Thursday, April 10 2003Lecturer: Amin Firoozshahian and Brad SchumitschScribe: Suzanne Rivoire and Ernesto StaroswieckiThe first paper for this lecture [1] categorizes various options for supporting spec-ulative multithreading in hardware, while the second [2] describes a way to implementit in software without sp ecific architectural support . The class discussion covered t herequirements and tradeoffs of software, hardware, and hybrid supp ort for speculativemultithreading.1 Paper 1: Speculative Multithreading Taxonomy1.1 SummaryThread-level speculation (TLS) is a way to extract parallelism from applications where itis not easy to g uar antee that threads are independent. To implement TLS an architecturemust buffer state, since it needs a way to back-up when a violation occurs. This paperintro duces a taxonomy used to classify the approaches to buffering speculative memorystate. The authors classify architectures based upon where they store their speculativememory as well as the number of speculative tasks allowed per processor.In Architecture Main Memory (AMM), committed memory is stored in the mainmemory system. The paper describes two flavors of AMM, lazy and eager. In eagermerging, the speculative state is merged into main memory as soon as a thread commits.The next speculative thread is not allowed to become non-speculative (the head) untilthis data is merged. In some cases, this merging is on the critical path o f the program.Lazy merging is free of this problem since thread is allowed to pass the head token befo reits speculative data has been transferred to main memory.In Future Main Memory (FMM), main memory contains the most recent version ofeach piece of data. If there are speculative versions of a section of memory, the committedstate is kept in a buffer. In general, FMM performs commits faster than AMM since theformerly speculative data is already in the memory hierarchy. However, violations aremore costly in FMM since it must restore the state of the memory hierarchy.They also classified speculative architectures based on how many non-committed spec-ulative tasks can be assigned to each processor. In a Single Speculative Ta sk (SingleT)architecture, a processor cannot be assigned another speculative task until its currenttask is committed. In SingleT architectures, load balancing is important since a shortspeculative thread will cause a processor to stall until a long head thread has completed.2 EE392C: Lecture #4In contrast, Multiple Speculative Tasks (MultiT) architectures can assign another taskto their processors as soon as these processors complete their current task. There aretwos flavors of MultiT. In single version MultiT the processor’s local memory only allowsone speculative version of each piece of memory. Thus, if a speculative task needs tocreate its own version of a piece of data that a previous not-yet-committed speculativetask already made speculative, the processor must stall. Multiple Version MultiT has nosuch restriction.1.2 ConclusionsLazy memory outperforms Eager by 30% in a multi-chip multiprocessor, 9% in a CMP.Multiple Version MultiT outperforms SingleT by 32% in a multi-chip system and 23%in a CMP. These improvements are fairly orthog onal, so there is significant benefit toemploying both. F inally, Lazy AMM is competitive with FMM even though it has lowerhardware complexity.1.3 CritiqueThis paper does put f orth a taxonomy for classifying speculative architectures. However,it does not put forth any new ideas; it merely evaluates old ones.1.4 Future WorkSpeculation is pro mising way to extract more parallelism from applications. Determiningwhich type of speculation supp ort is optimal for which applications is an open interestingquestions.2 Paper 2: RawThe literature contains many proposals for supp orting speculative multithreading in hard-ware for multiprocessor systems. These hardware mechanisms are mainly used for de-tecting dependency violations and doing rollbacks. This paper talks about supportingspeculation in a CMP environment without specific hardware supp ort. The paper de-scribes a scheme calls Softwa r e Un-Do System (SUDS) and explains how dependencychecks and rollbacks can be handled by software.In the proposed scheme, the compiler attempts to make speculative t hreads out ofloop iteratio ns. It first categorizes each memory access in an iteratio n as one of threedifferent types:• Private references: Variables specific to that iteration• Loop-carried references: Values that should be propagated to later iterationsEE392C: Lecture #4 3• Ambiguous references: Locations which can not be analyzed at compile timeThe compiler tries to privatize all the variables that are specific to an iteration o fthe loop. On the other hand, for loop-carried values, the compiler inserts direct tile-to-tile communication instructions to propagate va lues from one loop iteration to the next.Ambiguous references are again replaced by instructions that communicate with memorynodes.Dependency analysis is done by a set of tiles assigned to work as “memory nodes”in the RAW CMP. These nodes also are responsible for checking addresses to see if thecurrent location being written has been previously read by a later iteration. They alsodo rollbacks by keeping a log of the previous versions of variables.The compiler analysis explained in the paper is useful not only in this scheme, butalso in architectures that have specific hardware to support speculation. The softwareoptimizations o n top of hardware support will reduce the number of speculative a ccessesto memory and force values to be propagated for known loop- carried dependencies, whichwill decrease the number of potential squashes.On the other hand, although the scheme seems not to have any specific hardwaresuppo r t for speculation, it relies on RAW’s low-latency communication network, whichallows the compiler to insert communication instructions at the register level between twotiles. Another weakness is the evaluation of SUDS; only one application is considered asa case study, and no real performance numbers are mentioned in the paper.3 Class Discussion3.1 Dividing responsibilities for speculationThe first question posed in the discussion was where to draw the line between hardwareand softwa r e support f or speculation;


View Full Document
Download Lecture 4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?