DOC PREVIEW
CMU CS 15740 - Threaded Abstract Machine

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1TAMTAMTAMDavidDavid22{{KoesKoes, , McWherterMcWherter}}Threaded Abstract Machine• Fine-grained parallel execution model– Multithreading at insn level– Exploits parallelism within tasks- Independent loop-bodies, function-calls• Compiler explicitly bundles instructions– Based on dependencies and latencies• Maps to existing uni/multi-proc systems2TAM Motivation• Instruction Delays/Latencies suck– Multithreading at insn level can help!•Yay! Multithreading!– Asynchronous transfer of control costly!• Boo! Multithreading!• TAM lets compiler control scheduleThreads, Blocks, and Frames, Oh My!•Thread–~basic block– statically ordered– non blocking (except last insn)– synchronizing/unsynchronizing• Code-block– ~function or loop body– collection of threads– A program is a bunch of code-blocks3Threads, Blocks, and Frames, Oh My!• Activation Frame– Multithreaded equivalent of a stack– Allocated when invoking a code-block– Threads in frame share registers• Exploits memory hierarchy– Gobs and gobs of allocated frames!• Run one ready frame per processor until no threads ready to goCall/Return(loop,funcall)On each processor, all frames holding enabled threads are linked into a ready queue4Threads, Blocks, and Frames, Oh My!•Frames Contain–Thread synchronization counters–Argument/Local variable slots– Continuation Vector• List of threads ready to runTAM Architecturally• Compare TAM (normalized hardware)•CM-5’ (Standard multiproc)• J’-Machine (TAM-specialized hardware)– Slots: tagged data words, trap on access (data-not-ready)– Continuation-Vector, Message Buffs: On-chip–Async MsgSend/Receive• Need to interact with fork and synch. Insns•Overall– Communication, Control flow dominate5Inlets and Active Messages•Inlet– Compiler generated message handler (thread)–Purpose• Copy messages from network into local memory• Enable dependant computation– Run at higher priority than normal threads– Provides support for Active MessagesActive Messages Motivation• Message Passing– Often fails to overlap:•Communication• Computation– High communication overheads•Latency• Buffering•Goal: – Make it easy to not suck!6Active Messages• Interleave communcation/compute–Sending:• Put address of message handler into message – Receiving:• Interrupt current computation• Execute message handler– Integrate data into current computation• Resume current computationActive Messages; Pains• Interrupting computation hurts– May require CPU interrupt handler• Kernel –to –user –back –to –user computation– Message handler must not block too long (or deadlock!) – Solutions may include• User-level interrupts on processor• PC-injection (2 independent PCs on CPU)• Polling (eg: Compiler-driven)• Dual/co-processors (but need synchronization)• Needs shared program image across processors– J-Machine hardware dispatches to address in messages– Another layer of indirection could relax this7Active Message Results• Order of magnitude improvement– Or so, I am told (no graphs)– “startup cost” using Active Messages• nCUBE/2: 160usecs reduced to 30usecs• CM-5: 86usecs reduced to 23usecs– (blocking send/receives)• Still must start requests early– AM just helps interleave with computeSo What?•UniprocTAM is faster than LISP!–(slower than C)• Fine-grained parallelism–Where is my TAM?– Where is my


View Full Document

CMU CS 15740 - Threaded Abstract Machine

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Threaded Abstract Machine
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Threaded Abstract Machine and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Threaded Abstract Machine 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?