Rose-Hulman CSSE 332 - Introduction to Multithreading, Superthreading and Hyperthreading - D1609244

Home> Schools> Rose-Hulman Institute of Technology> (CSSE) > CSSE 332> Introduction to Multithreading, Superthreading and Hyperthreading

DOC PREVIEW

Rose-Hulman CSSE 332 - Introduction to Multithreading, Superthreading and Hyperthreading

School name Rose-Hulman Institute of Technology

Course Csse 332- Operating Systems

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Superthreading with a multithreaded processorImplementing hyper-threadingReplicated resourcesPartitioned resourcesShared resourcesCaching and SMTCache conflictsConclusionsBibliographyRevision HistoryIntroduction to Multithreading, Superthreading and Hyperthreading by Jon "Hannibal" Stokes Back in the dual-Celeron days, when symmetric multiprocessing (SMP) first became cheap enough to come within reach of the average PC user, many hardware enthusiasts eager to get in on the SMP craze were asking what exactly (besides winning them the admiration and envy of their peers) a dual-processing rig could do for them. It was in this context that the PC crowd started seriously talking about the advantages of multithreading. Years later when Apple brought dual-processing to its PowerMac line, SMP was officially mainstream, and with it multithreading became a concern for the mainstream user as the ensuing round of benchmarks brought out the fact you really needed multithreaded applications to get the full benefits of two processors. Even though the PC enthusiast SMP craze has long since died down and, in an odd twist of fate, Mac users are now many times more likely to be sporting an SMP rig than their x86-using peers, multithreading is once again about to increase in importance for PC users. Intel's next major IA-32 processor release, codenamed Prescott, will include a feature called simultaneous multithreading (SMT), also known as hyper-threading. To take full advantage of SMT, applications will need to be multithreaded; and just like with SMP, the higher the degree of multithreading the more performance an application can wring out of Prescott's hardware. Intel actually already uses SMT in a shipping design: the Pentium 4 Xeon. Near the end of this article we'll take a look at the way the Xeon implements hyper-threading; this analysis should give us a pretty good idea of what's in store for Prescott. Also, it's rumored that the current crop of Pentium 4's actually has SMT hardware built-in, it's just disabled. (If you add this to the rumor about x86-64 support being present but disabled as well, then you can get some idea of just how cautious Intel is when it comes to introducing new features. I'd kill to get my hands on a 2.8 GHz P4 with both SMT and x86-64 support turned on.) SMT, in a nutshell, allows the CPU to do what most users think it's doing anyway: run more than one program at the same time. This might sound odd, so in order to understand how it works this article will first look at how the current crop of CPUs handles multitasking. Then, we'll discuss a technique called superthreading before finally moving on to explain hyper-threading in the last section. So if you're looking to understand more about multithreading, symmetric multiprocessing systems, and hyper-threading then this article is for you. As always, if you've read some of my previous tech articles you'll be well equipped to understand the discussion that follows. From here on out, I'll assume you know the basics of pipelined execution and are familiar with the general architectural division between a processor's front end and its execution core. If these terms are mysterious to you, then you might want to reach way back and check out my "Into the K7" article, as well as some of my other work on the P4 and G4e. Conventional multithreading Quite a bit of what a CPU does is illusion. For instance, modern out-of-order processor architectures don't actually execute code sequentially in the order in which it was written. I've covered the topic of out-of-order execution (OOE) in previous articles, so I won't rehash all that here. I'll just note that an OOE architecture takes code that was written and compiled to be executed in a specific order, reschedules the sequence of instructions (if possible) so that they make maximum use of the processor resources, executes them, and then arranges them back in their original order so that the results can be written out to memory. To the programmer and the user, it looks as if an ordered, sequential stream of instructions went into the CPU and identically ordered, sequential stream of computational results emerged. Only the CPU knows in what order the program's instructions were actually executed, and in that respect the processor is like a black box to both the programmer and the user. The same kind of sleight-of-hand happens when you run multiple programs at once, except this time the http://www.arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.htmloperating system is also involved in the scam. To the end user, it appears as if the processor is "running" more than one program at the same time, and indeed, there actually are multiple programs loaded into memory. But the CPU can execute only one of these programs at a time. The OS maintains the illusion of concurrency by rapidly switching between running programs at a fixed interval, called a time slice. The time slice has to be small enough that the user doesn't notice any degradation in the usability and performance of the running programs, and it has to be large enough that each program has a sufficient amount of CPU time in which to get useful work done. Most modern operating systems include a way to change the size of an individual program's time slice. So a program with a larger time slice gets more actual execution time on the CPU relative to its lower priority peers, and hence it runs faster. (On a related note, this brings to mind one of my favorite .sig file quotes: "A message from the system administrator: 'I've upped my priority. Now up yours.'") Clarification of terms: "running" vs. "executing," and "front end" vs. "execution core." For our purposes in this article, "running" does not equal "executing." I want to set up this terminological distinction near the outset of the article for clarity's sake. So for the remainder of this article, we'll say that a program has been launched and is "running" when its code (or some portion of its code) is loaded into main memory, but it isn't actually executing until that code has been loaded into the processor. Another way to think of this would be to say that the OS runs programs, and the processor executes them. The other thing that I should clarify before proceeding is that the way that I divide up the processor in this and other articles differs from the way that Intel's literature divides it. Intel will describe its processors as having an "in-order front

View Full Document