DOC PREVIEW
CMU CS 15740 - Lecture

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Carnegie Mellon, 15740 Fall 03 1Simultaneous MultithreadingPratyusa Manadhata (pratyus@cs)Vyas Sekar(vyass@cs)Carnegie Mellon, 15740 Fall 03 2References Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen. Simultaneous Multithreading: A Platform for Next-generation Processors,in IEEE Micro, September/October 1997, pages 12-18.  Jack Lo, Susan Eggers, Joel Emer, Henry Levy, Rebecca Stamm, and Dean Tullsen. Converting Thread-Level Parallelism Into Instruction-Level Parallelism via Simultaneous Multithreading,in ACM Transactions on Computer Systems, August 1997, pages 322-354.  Dean Tullsen, Susan Eggers, Joel Emer, Henry Levy, Jack Lo, and Rebecca Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, in Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996, pages 191-202.Carnegie Mellon, 15740 Fall 03 3Motivation For significant performance improvement, improving memory subsystem or increasing system integration not sufficient. So increase parallelism in all its available form Instruction Level Parallelism (ILP) Thread Level Parallelism (TLP)Carnegie Mellon, 15740 Fall 03 4Architectural Alternatives Superscalar Multithreaded Super scalar MultiProcessors Neither superscalar or SMP can capture ILP/TLP in its entirety Incapable of adapting to dynamic levels of ILP, and TLPCarnegie Mellon, 15740 Fall 03 5Simultaneous Multithreading TLP from either multithreaded parallel programs or from multiprogramming workload ILP from each thread Characteristics of SMT processors: from superscalar: issue multiple instructions per cycle from multithreaded: h/w state for multiple threadsCarnegie Mellon, 15740 Fall 03 6 Time ( Pr ocessor cycl e) Unutilized Thread 1 Thread 2 Thread 3 Thread 4 Thread 5SuperscalarMultithreadedSMTIssue slotsCarnegie Mellon, 15740 Fall 03 7Comparison Superscalar:  looks at multiple instructions from same process, both horizontal and vertical waste. Multithreaded:  minimizes vertical waste: tolerate long latency operations SMT :  Selects instructions from any "ready" threadCarnegie Mellon, 15740 Fall 03 8SMT Model Minimal extension of superscalar processor Changes in IF stage and register files only No static partitioning of resources Most of the hardware is still available to a single thread.Carnegie Mellon, 15740 Fall 03 9SMT Model Per thread State for hardware context (PC, registers) Instruction retirement, trapping, subroutine return Per thread id in BTB and TLB I cache port Large register file No of physical registers = 8 * 32 + registers for renaming Longer access timeCarnegie Mellon, 15740 Fall 03 10PipelinesuperscalarSMTCarnegie Mellon, 15740 Fall 03 11Fetch Mechanism (2.8 scheme) Select 2 threads not incurring I cache miss, read 8 instructions from each thread. Choose as many possible from first thread and rest from the second, upto8. Alternative – 1.8, 2.4, 4.2Carnegie Mellon, 15740 Fall 03 12I Count Which thread to fetch from threads that have least number of instructions in the decode, rename and queue pipeline stages. even distribution, prevents starvationCarnegie Mellon, 15740 Fall 03 13Results/Observations Superscalars: approximately give an IPC of about 1-2 SMT: significantly higher than the values reported for superscalar Longer latency for a single thread? Why? not a significant performance effectCarnegie Mellon, 15740 Fall 03 14Results/Observations… SMT absorbs additional conflicts: greater ability to hide latency by using multiple issues from multiple threads. SMP MP2 and MP4 hindered by static resource partitioning SMT dynamically partitions resources among threadsCarnegie Mellon, 15740 Fall 03 15Results/Observations.. Multithreading can increase cache misses/conflicts  More memory requirement More stress on branch prediction h/w Impact on program performance is not significant -> SMT + h/w + compiler opts can hide latencyCarnegie Mellon, 15740 Fall 03 16Future Directions Each processor in an SMP can use SMT Next generation architectures: SMP on chip instead of wider superscalars Is the performance gain adequate with the additional resource cost Processor Cycle Design Time: Cost vs Performance  Writing optimizing Compilers to take advantage of SMT. OS support for thread scheduling, thread priority etcCarnegie Mellon, 15740 Fall 03 17Q & A?Carnegie Mellon, 15740 Fall 03 18Thank


View Full Document

CMU CS 15740 - Lecture

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?