Fast CommunicationWhy Remote Procedure Call?RPC ModelRPC In Modern ComputingGoalsFirefly RPCFast Path in a RPCCaller StubServer StubTransport MechanismSlide 11ThreadingSlide 13Performance EnchancementsPerformance AnalysisTime for 10,000 RPCsSend and Receive LatencySlide 18Stub LatencyFewer ProcessorsSlide 21Future ImprovementsOther ImprovementsRPC Size DistributionFrequency of Remote ActivityTraditional RPCLightweight RPC (LRPC)Overhead AnalysisSources of OverheadImplementation of LRPCBindingCallingStub GenerationMultiple ProcessorsArgument CopyingSlide 36Single-Processor Null() LRPCPerformance ComparisonMultiprocessor SpeedupInter-machine CommunicationCostConclusionFast CommunicationFirefly RPCLightweight RPCCS 614Tuesday March 13, 2001Jeff HoyWhy Remote Procedure Call?Simplify building distributed systems and applicationsLooks like local procedure callTransparent to userBalance between semantics and efficiencyUniversal programming toolSecure inter-process communicationRPC ModelClient ApplicationClient StubClient RuntimeServer ApplicationServer StubServer RuntimeNetworkCallReturnRPC In Modern ComputingCORBA and Internet Inter-ORB Protocol (IIOP)Each CORBA server object exposes a set of methodsDCOM and Object RPCBuilt on top of RPCJava and Java Remote Method Protocol (JRMP)Interface exposes a set of methodsXML-RPC, SOAPRPC over HTTP and XMLGoalsFirefly RPCInter-machine CommunicationMaintain Security and FunctionalitySpeedLightweight RPCIntra-machine CommunicationMaintain Security and FunctionalitySpeedFirefly RPCHardwareDEC Firefly multiprocessor1 to 5 MicroVAX CPUs per nodeConcurrency considerations10 megabit EthernetTakes advantage of 5 CPUsFast Path in a RPCTransport MechanismsIP / UDPDECNet byte streamShared Memory (intra-machine only)Determined at bind timeInside transport procedures “Starter”, “Transporter”, “Ender”, and “Receiver” for the serverCaller StubGets control from calling programCalls “Starter” for packet bufferCopies arguments into the bufferCalls “Transporter” and waits for replyCopies result data onto caller’s result variablesCalls “Ender” and frees result packetServer StubReceives incoming packetCopies data into stack, a new data block, or left in the packetCalls server procedureCopies result into the call packet and transmitTransport Mechanism“Transporter” procedureCompletes RPC headerCalls “Sender” to complete UDP, IP, and Ethernet headers (Ethernet is the chosen means of communication)Invoke Ethernet driver via kernel trap and queue the packetTransport Mechanism“Receiver” procedureServer thread awakens in “Receiver”“Receiver” calls the stub interface included in the received packet, and the interface stub calls the procedure stubReply is similarThreadingClient Application creates RPC threadServer Application creates call thread Threads operate in server application’s address spaceNo need to spawn entire processThreads need to consider locking resourcesThreadingPerformance EnchancementsOver traditional RPCStubs marshal arguments rather than library functions handling argumentsRPC procedures called through procedure variables rather than by lookup tableServer retains call packet for resultsBuffers reside in shared memorySacrifices abstract structurePerformance AnalysisNull() ProcedureNo arguments or return valueMeasures base latency of RPC mechanismMulti-threaded caller and serverTime for 10,000 RPCsBase latency – 2.66msMaxResult latency (1500 bytes) – 6.35msSend and Receive LatencySend and Receive LatencyWith larger packets, transmission time dominatesOverhead becomes less of an issueGood for Firefly RPC, assuming large transmission over networkIs overhead acceptable for intra-machine communication?Stub LatencySignificant overhead for small packetsFewer ProcessorsSeconds for 1,000 Null() callsFewer ProcessorsWhy the slowdown with one processor?Fast path can be followed only in multiprocessor environmentLock conflicts, scheduling problemsWhy little speedup past two processors?Future ImprovementsHardwareFaster network will help larger packetsTriple CPU speed will reduce Null() time by 52% and MaxResult by 36%SoftwareOmit IP and UDP headers for Ethernet datagrams, 2~4% gainRedesign RPC protocol ~ 5% gainBusy thread wait, 10~15% gainWrite more in assembler, 5~10% gainOther ImprovementsFirefly RPC handles intra-machine communication through the same mechanisms as inter-machine communicationFirefly RPC also has very high overhead for small packetsDoes this matter?RPC Size DistributionMajority of RPC transfers under 200 bytesFrequency of Remote ActivityMost calls are to the same machineTraditional RPCMost calls are small messages that take place between domains of the same machineTraditional RPC contains unnecessary overhead, likeSchedulingCopyingAccess validationLightweight RPC (LRPC)Also written for the DEC Firefly systemMechanism for communication between different protection domains on the same systemSignificant performance improvements over traditional RPCOverhead AnalysisTheoretical minimum to invoke Null() across domains: kernal trap + context change to call and a trap + context change to returnTheoretical minimum on Firefly RPC: 109 us.Actual cost: 464usSources of Overhead355us addedStub overheadMessage buffer overheadNot so much in Firefly RPCMessage transfer and flow controlScheduling and abstract threadsContext SwitchImplementation of LRPCSimilar to RPCCall to server is done through kernel trap Kernel validates the callerServers export interfacesClients bind to server interfaces before making a callBindingServers export interfaces through a clerkThe clerk registers the interfaceClients bind to the interface through a call to the kernelServer replies with an entry address and size of its A-stackClient gets a Binding Object from the kernelCallingEach procedure is represented by a stubClient makes a call through the stubManages A-stacksTraps to the kernelKernel switches context to the serverServer returns by its own stubNo verification neededStub GenerationProcedure representationCall stub for clientEntry stub for serverLRPC merges protocol layersStub generator creates run-time stubs in assembly languagePortability sacrificed for Performance Falls back on
View Full Document