Berkeley COMPSCI 262A - SEDA Capriccio - D1047018

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 262A> SEDA Capriccio

DOC PREVIEW

Berkeley COMPSCI 262A - SEDA Capriccio

School name University of California, Berkeley

Course Compsci 262a- Advanced Topics in Computer Systems

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

SEDA CapriccioI. Backgroundo The standard model (similar to Mesa)o Concurrent threads with locks/mutex/etc. for synchronizationo Blocking callso May hold locks for long timeo Problems with more than 100 or so threads due to OS overhead (why?)o Strong support from OS, libraries, debuggers, ....o Lots of small handlers that run to completiono Basic model is event arrives and runs a handlero An event loops runs at the core waiting for arrivals, then calls handlero No context switch, just procedure callo Threads exist, but just run event loop -> handler -> event loopo Natural fit with finite-state machines (FSMs)o Allows very high concurrencyII. Cooperative Task Schedulingo preemptive: tasks may be interrupted at any timeo serial: tasks run to completiono cooperative: tasks are not pre-empted, but do yield the processoro Note: pre-emption is OK if it can’t affect the current atomic section. Easy way to achieve this is data partitioning! Only threads that access the shared state are a problem!o Threads: not too bad -- just block until the action completes (synchronous)o Events: hardo Threads -- pessimistic: acquire lock and then blocko Threads -- optimistic: read state, block, try write, retry if fail (and re-block!)o Events -- pessimistic: acquire lock, store state in continuation ; later reply completes and releases lock. (seems hard to debug, what if event never comes? or comes more than once?)o Events -- optimistic: read state, store in continuation ; apply write, retry if failo Basic problem: exclusive access can last a long time -- hard to make progresso General question: when can we move the lock to one side or the other?o One strategy:III. SEDAo 1) need a better way to achieve concurrency than just threadso 2) need to provide graceful degradationo 3) need to enable feedback loops that adapt to changing conditionso Internet makes the concurrency higher, requires high availability and ensures that load will exceed the target range.o want throughput to increase linearly and then stay flat as you exceed capacityo want response time to be low until saturation and then linearly increaseo want fairness in the presence of overloado almost no systems provide thiso key is to drop work early or to queue it for later (threads have implicit queues on locks, sockets, etc.)o virtualization makes it harder to know where you stand!o threads limits are too small in practice (about 100); some of this is due to linear searches in internal data structures, or limits on kernel memory allocationo claims about locks, overhead, TLB and cache misses are harder to understand -- don’t seem to be fundamental over eventso queues are implicit, which makes it hard to control or even identify the bottleneckso key insight in SEDA: no user allocated threads: programmer defines what can be concurrent and SEDA manages the threads. Otherwise no way to control the overall number or distribution of threadso debuggingo legacy codeo stack rippingo use threads within a stage, events between stageso stages have explicit queues and explicit concurrencyo threads (in a stage) can block, just not too ofteno SEDA will add and remove threads from a stage as neededo simplifies modularity: queues decouple stages in terms of performance at some cost to latencyo threads never cross stages, but events can be pass by value or pass by reference.o stage scheduling affects locality -- better to run one stage for a while than to follow an event through multiple stages. This should make up for the extra latency of crossing stages.o works for any measurable property that has smooth behavior (usually continuous as well). Property typically needs to be monotonic in the control area (else get lost in local minima/maxima)o within a stage: batching controller decides how many events to process at one timeo thread pool controller: find the minimum number of threads that keeps queue length lowo global thread allocation based on priorities or queue lengths...o but huge dropped requests to maintain response time goalo however, can’t really do any better than this...o how does SEDA do split-phase actions?o Intra-stage:o Inter-stage:IV. Capriccioo leverage async I/Oo scale to 100,000 threads (qualitative difference: one per connection!)o enable compiler support and invariantso easy, low-cost synchronization (but not for I/O which is slow anyway)o *control* over thread semantics, invariantso enable application-specific behavoir (e.g. scheduling) and optimizationso enable compiler assistance (e.g. safe stacks)o still have two schedulerso async I/O mitigates this by eliminating largest cause of blocking (in kernel)o still can block unexpectedly -- page faults or close() exampleo current version actually stops running in such cases (so must be rare)o can’t schedule multple processes at user-level, only threads within a process (not a problem for dedicated machines like servers)o POSIX interface for legacy apps, but now at user level with a runtime libraryo make all thread ops O(1)o deal with stack space for 100,000 threads (can’t just give each 2MB)o allows lots of user threads to map to small number of kernel threadso allows better disk throughputo differnt mechanisms for network (epoll) and disk (AIO), but this is hidden from userso not well developed yeto goal: transparent but application specific (!)o note: lots of earlier work on OS extensibility that was hard to use and not well justifiedo here we are not requiring work on the part of the programmer, and we are focused on servers, which actually do need different support1Advanced Topics in Computer Systems, CS262AProf. Eric BrewerSEDACapriccioI. BackgroundThreads:o The standard model (similar to Mesa)o Concurrent threads with locks/mutex/etc. for synchronizationo Blocking callso May hold locks for long timeo Problems with more than 100 or so threads due to OS overhead (why?)• see SEDA graph of throughput vs. number of threadso Strong support from OS, libraries, debuggers, ....Events:o Lots of small handlers that run to completiono Basic model is event arrives and runs a handler• state is global or part of handler (not much in between)o An event loops runs at the core waiting for arrivals, then calls handlero No context switch, just procedure callo Threads exist, but just run event loop -> handler -> event loop• stack trace is not useful for debugging!• typically one thread per CPU (any more doesn’t add anything since threads don’t block)• Sometimes have extra threads

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Berkeley COMPSCI 262A - SEDA Capriccio

Sign up for free to view:

Please select your school