DOC PREVIEW
CORNELL CS 614 - Fast Communication and User Level Parallelism

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Fast Communication and User Level ParallelismIntroductionThreadsImplementationAdvantages and problems of ULTAdvantages and inconveniences of KLTULT with Scheduler ActivationsULT over KLTThe ModelKernel Support of ULTScheduler ActivationsHow the kernel and scheduler work togetherHints to KernelCritical SectionsCritical Sections (Cont.)ResultsResults 2Threads SummaryRemote Procedure CallsProblems with RPCWays to improveAnatomy of a remote RPCLightweight RPC (LRPC)Anatomy of a local LRPCMultiprocessorsSlide 26LRPC ConclusionsFast Communication and User Level ParallelismHoward MarronIntroductionWe have studied systems that have attempted to build transparent layers below the application that created properties like replication and group communication. We will look at some areas where more control has been given to the user on parallelismThreadsAllows smaller granularity to programs for better parallelism and performance.Will have lower overhead than processesSame program will run on one machine as a multiprocessor with little or no modificationThreads in same process can easily communicate since they share the same address spaceImplementationDo we want threads and if so where should we implement them?Operation FastThreads(ULT) Topaz threads (KLT) Ultrix processesNull - Fork 34 948 11300Signal- W ait 37 441 1840Latency in μsec on a Firefly systemAdvantages and problems of ULTAdvantagesThread switching does not involve the kernel:Scheduling can be application specific: choose the best algorithm.ULTs can run on any OS. Only needs a thread libraryDisadvantagesMost system calls are blocking and the kernel blocks processes. So all threads within the process will be blockedThe kernel can only assign processes to processors. Two threads within the same process cannot run simultaneously on two processorsAdvantages and inconveniences of KLTAdvantagesThe kernel knows what the processing environment is and will assign threads accordingly.Blocking is done on a thread levelKernel routines can be multithreadedDisadvantagesThread switching within the same process involves the kernel. We have 2 mode switches per thread switch.This results in a significant slow down in thread switches within same processULT with Scheduler ActivationsImplement user level threads with the help of the kernel.Gain the flexibility and performance of ULTHave functionality of KLT without the overheadULT over KLTKernel operates without knowledge of user programmingUser threads are never notified of what the kernel schedules since it is transparent to userKernel schedules threads without respect to user thread priorities and memory locations.The ModelP1 P2SchedulerSchedulerUser levelThread poolKernel runs an instance of thescheduler on each processor.Kernel Support of ULTKernel has control of processor allocationULT has control of what threads to run on allocated processorsKernel notifies ULT scheduler of any changes to environmentULT scheduler can notify Kernel of current processor needsScheduler ActivationsAdd processor – run a thread hereProcessor preempted – returns state of preempted processor, can run another threadScheduler has blocked – can run thread hereScheduler has unblocked – return thread to ready listHow the kernel and scheduler work togetherHints to KernelAdd more processorsThis processor is idleCritical SectionsIdea 1 On a CS conflict give control back to thread holding lockThread will give control back after done with CS.Found that was too slow to find if thread was in CSHard to make thread give up control after CS is doneCritical Sections (Cont.)Idea 2Make copies of critical sections available to scheduler.Compare PC of thread with CS to check if holding a lockCan run the copy of CS and will return sooner than before since the release of the lock is known to the scheduler.ResultsOperation Fast ThreadsFastThreads w/ schedulers activationsTopaz Threads Ultrix ProcessesNull-fork 34 37 948 11300Signal W ait 37 42 441 1840Results 2Threads SummaryBest solution to threads problem will lay somewhere between ULT and KLTBoth must cooperate for best performanceWant to have most of control in user level to manage threads since kernel is far away from threadsRemote Procedure CallsA technique for constructing distributed systemsAllows user to have no knowledge of transport systemCalled procedure can be located anywhereStrong client/server model of computingProblems with RPCAdds huge amount of overheadMore protection in every callAll calls trap to OSHave to wait for response from other systemAll calls treated the same – worst caseWays to improve95%< all RPCs are to local domainOptimize most taken pathReduce number of system boundaries that RPC crossesAnatomy of a remote RPCcallRPC()Run serviceCLIENTSERVERKernelUserUserProtection checksMessage transferInterpret andDispatchScheduleWake up thread rescheduleProtection checksMessage transferReplyLightweight RPC (LRPC)Create new routines for cross domain callsUse RPC similar calls for cross system callsBlur the line of client/server in new callsReduce number of variable copies to messages and stacks by maintaining stacks that are dedicated to individual callsEliminates needs to schedule threads on RPC receipt at server, because processor can be instructed to just switch the calling and called threadsAnatomy of a local LRPCcallRPC()Run serviceCLIENT KernelUserProtection checksCopy to StackReplyResumeThere is no need to scheduleThreads here, the scheduler Can be told to just switch The two threadsCopy to StackMultiprocessorsCan cache whole processor contexts on idle processorsInstead of context switching local processor for cross domain calls, run procedure on cached processorSaves on TLB misses and other exchanges like virtual memoryResultsLRPC ConclusionsRPCs can be improved for general caseCommon case should be emphasized not the most general caseCan reduce many unnecessary tasks when optimizing for cross domain


View Full Document

CORNELL CS 614 - Fast Communication and User Level Parallelism

Documents in this Course
Load more
Download Fast Communication and User Level Parallelism
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fast Communication and User Level Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fast Communication and User Level Parallelism 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?