Unformatted text preview:

Advanced Topics in Computer Systems CS262A Prof Eric Brewer SEDA Capriccio I Background Threads o o o o o The standard model similar to Mesa Concurrent threads with locks mutex etc for synchronization Blocking calls May hold locks for long time Problems with more than 100 or so threads due to OS overhead why see SEDA graph of throughput vs number of threads o Strong support from OS libraries debuggers Events o Lots of small handlers that run to completion o Basic model is event arrives and runs a handler state is global or part of handler not much in between o An event loops runs at the core waiting for arrivals then calls handler o No context switch just procedure call o Threads exist but just run event loop handler event loop stack trace is not useful for debugging typically one thread per CPU any more doesn t add anything since threads don t block Sometimes have extra threads for things that may block e g OSs that only support synchronous disk reads o Natural fit with finite state machines FSMs arrows are handlers that change states blocking calls are split into two states before and after the call o Allows very high concurrency multiplex 10 000 FSMs over a small number of threads II Cooperative Task Scheduling Tasks 1 Week 4 o preemptive tasks may be interrupted at any time must use locks mutex to get atomicity may get pre empted while holding a lock others must wait until you are rescheduled might want to differentiate short and long atomic sections short should finish up work o serial tasks run to completion basic event handlers which are atomic not allowed to block what if they run too long not much to do about that could kill them implies might be better for friendly systems hard to support multiprocessors o cooperative tasks are not pre empted but do yield the processor can use stacks and make calls but still interleaved yield points are not atomic limits what you can do in an atomic section better with compiler help is a call a yield point or not hard to support multiprocessors o Note pre emption is OK if it can t affect the current atomic section Easy way to achieve this is data partitioning problem Only threads that access the shared state are a Can pre empt for system routines Can pre empt to switch to a different process with its own set of threads but assumes processes don t share state Split phase actions how do you implement a split phase action o Threads not too bad just block until the action completes synchronous Assumes other threads run in the meantime Ties up considerable memory full stack Easy memory management stack allocation deallocation matches natural lifetime o Events hard Must store live state in a continuation on the heap usually Handler lifetime is too short so need to explicitly allocate and deallocate later Scoping is bad too need a multi handler scope which usually implies global scope Rips the function into two functions before and after Debugging is hard Evolution is hard adding a yielding call implies more ripping to do converting a non yielding call into a yielding call is worse every call site needs to be ripped and those sites may become yielding which cascades the problem Atomic split phase actions really hard 2 Week 4 o Threads pessimistic acquire lock and then block o Threads optimistic read state block try write retry if fail and re block o Events pessimistic acquire lock store state in continuation later reply completes and releases lock seems hard to debug what if event never comes or comes more than once o o o o Events optimistic read state store in continuation apply write retry if fail Basic problem exclusive access can last a long time hard to make progress General question when can we move the lock to one side or the other One strategy structure as a sequence of actions that may or may not block like cache reads acquire lock walk through sequence if block then release and start over if get all the way through then action was short and atomic This seems hard to automate Compiler would need to know that some actions mostly won t block or won t block the second time and then also know that something can be retried without multiple side effects Main conclusion for me compilers are the key to concurrency in the future III SEDA Problem to solve o o o o 1 need a better way to achieve concurrency than just threads 2 need to provide graceful degradation 3 need to enable feedback loops that adapt to changing conditions Internet makes the concurrency higher requires high availability and ensures that load will exceed the target range Graceful degradation o o o o o want throughput to increase linearly and then stay flat as you exceed capacity want response time to be low until saturation and then linearly increase want fairness in the presence of overload almost no systems provide this key is to drop work early or to queue it for later threads have implicit queues on locks sockets etc o virtualization makes it harder to know where you stand Problems with threads claimed 3 Week 4 o threads limits are too small in practice about 100 some of this is due to linear searches in internal data structures or limits on kernel memory allocation o claims about locks overhead TLB and cache misses are harder to understand don t seem to be fundamental over events do events use less memory probably some but not 50 less do events have fewer misses latencies are the same so only if working set is smaller is it bad to waste VM for stacks only with tons of threads is there a framentation issue with stacks lots of partially full pages probably to some degree each stack needs at least one page If so we can move to a model with non contiguous stack frames leads to a different kind of fragmentation If so we could allocate subpages like FFS did for fragments and thread a stack through subpages but this needs compiler support and must recompile all libraries o queues are implicit which makes it hard to control or even identify the bottlenecks o key insight in SEDA no user allocated threads programmer defines what can be concurrent and SEDA manages the threads Otherwise no way to control the overall number or distribution of threads Event based approach has problems too o debugging o legacy code o stack ripping Solution o o o o o use threads within a stage events between stages stages have explicit queues and explicit concurrency threads in a stage can block just not too often SEDA will add and remove threads from a stage as needed simplifies modularity queues


View Full Document

Berkeley COMPSCI 262A - SEDA Capriccio

Loading Unlocking...
Login

Join to view SEDA Capriccio and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SEDA Capriccio and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?