Unformatted text preview:

TAM David2 Koes Koes McWherter McWherter Threaded Abstract Machine Fine grained parallel execution model Multithreading at insn level Exploits parallelism within tasks Independent loop bodies function calls Compiler explicitly bundles instructions Based on dependencies and latencies Maps to existing uni multi proc systems 1 TAM Motivation Instruction Delays Latencies suck Multithreading at insn level can help Yay Multithreading Asynchronous transfer of control costly Boo Multithreading TAM lets compiler control schedule Threads Blocks and Frames Oh My Thread basic block statically ordered non blocking except last insn synchronizing unsynchronizing Code block function or loop body collection of threads A program is a bunch of codeblocks 2 Threads Blocks and Frames Oh My Activation Frame Multithreaded equivalent of a stack Allocated when invoking a code block Threads in frame share registers Exploits memory hierarchy Gobs and gobs of allocated frames Run one ready frame per processor until no threads ready to go Call Return loop funcall On each processor all frames holding enabled threads are linked into a ready queue 3 Threads Blocks and Frames Oh My Frames Contain Thread synchronization counters Argument Local variable slots Continuation Vector List of threads ready to run TAM Architecturally Compare TAM normalized hardware CM 5 Standard multiproc J Machine TAM specialized hardware Slots tagged data words trap on access datanot ready Continuation Vector Message Buffs On chip Async Msg Send Receive Need to interact with fork and synch Insns Overall Communication Control flow dominate 4 Inlets and Active Messages Inlet Compiler generated message handler thread Purpose Copy messages from network into local memory Enable dependant computation Run at higher priority than normal threads Provides support for Active Messages Active Messages Motivation Message Passing Often fails to overlap Communication Computation High communication overheads Latency Buffering Goal Make it easy to not suck 5 Active Messages Interleave communcation compute Sending Put address of message handler into message Receiving Interrupt current computation Execute message handler Integrate data into current computation Resume current computation Active Messages Pains Interrupting computation hurts May require CPU interrupt handler Kernel to user back to user computation Message handler must not block too long or deadlock Solutions may include User level interrupts on processor PC injection 2 independent PCs on CPU Polling eg Compiler driven Dual co processors but need synchronization Needs shared program image across processors J Machine hardware dispatches to address in messages Another layer of indirection could relax this 6 Active Message Results Order of magnitude improvement Or so I am told no graphs startup cost using Active Messages nCUBE 2 160usecs reduced to 30usecs CM 5 86usecs reduced to 23usecs blocking send receives Still must start requests early AM just helps interleave with compute So What Uniproc TAM is faster than LISP slower than C Fine grained parallelism Where is my TAM Where is my J Machine 7


View Full Document

CMU CS 15740 - Threaded Abstract Machine

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view Threaded Abstract Machine and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Threaded Abstract Machine and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?