Remote Procedure Calls (RPC)“Implementing RPC”MotivationHow RPC Works (Diagram)BindingPowerPoint PresentationNotes on BindingPacket-Level TransportSimple CaseSimple Case DiagramSimple Case (cont.)Complicated CallSlide 13Exception HandlingProcessesOther OptimizationEnvironmentPerformance ChartPerformance ExplanationsPerformance cont.Slide 21RPCRuntime Recap“Performance of Firefly RPC”RPC Implementation on FireflyFirefly SystemStandard MeasurementsLatency and ThroughputSlide 28Marshaling TimeAnalysis of PerformanceTransporterReducing LatencySlide 33Slide 34Understanding PerformanceSlide 36Latency of RPC OverheadsLatency for Null and MaxImprovementsImprovements (cont.)Slide 41Effect of Processors TableEffect of ProcessorsEffect of Processors (cont.)Comparisons TableComparisonsRemote Procedure Calls (RPC)Presenter: Benyah ShaparenkoCS 614, 2/24/2004“Implementing RPC”Andrew Birrell and Bruce NelsonTheory of RPC was thought outImplementation details were sketchyGoal: Show that RPC can make distributed computation easy, efficient, powerful, and secureMotivationProcedure calls are well-understoodWhy not use procedural calls to model distributed behavior?Basic GoalsSimple semantics: easy to understandEfficiency: procedures relatively efficientGenerality: procedures well knownHow RPC Works (Diagram)BindingNaming + LocationNaming: what machine to bind to?Location: where is the machine?Uses a Grapevine databaseExporter: makes interface availableGives a dispatcher methodInterface info maintained in RPCRuntimeNotes on BindingExporting machine is statelessBindings broken if server crashesCan call only procedures server exportsBinding typesDecision about instance made dynamicallySpecify type, but dynamically pick instanceSpecify type and instance at compile timePacket-Level TransportSpecifically designed protocol for RPCMinimize latency, state informationBehaviorIf call returns, procedure executed exactly onceIf call doesn’t return, executed at most onceSimple CaseArguments/results fit in a single packetMachine transmits till packet receivedI.e. until either Ack, or a response packetCall identifier (machine identifier, pid)Caller knows response for current callCallee can eliminate duplicatesCallee’s state: table for last call ID rec’dSimple Case DiagramSimple Case (cont.)Idle connections have no state infoNo pinging to maintain connectionsNo explicit connection terminationCaller machine must have unique call identifier even if restartedConversation identifier: distinguishes incarnations of calling machineComplicated CallCaller sends probes until gets responseCallee must respond to probeAlternative: generate Ack automaticallyNot good because of extra overheadWith multiple packets, send packets one after another (using seq. no.) only last one requests Ack messageException HandlingSignals: the exceptionsImitates local procedure exceptionsCallee machine can only use exceptions supported in exported interface“Call Failed” exception: communication failure or difficultyProcessesProcess creation is expensiveSo, idle processes just wait for requestsPackets have source/destination pid’sSource is caller’s pidDestination is callee’s pid, but if busy or no longer in system, can be given to another process in callee’s systemOther OptimizationRPC communication in RPCRuntime bypasses software layersJustified since authors consider RPC to be the dominant communication protocolSecurityGrapevine is used for authenticationEnvironmentCedar programming environmentDoradosCall/return < 10 microseconds24-bit virtual address space (16-bit words)80 MB diskNo assembly language3 Mb/sec Ethernet (some 10 Mb/sec)Performance ChartPerformance ExplanationsElapsed times accurate to within 10% and averaged over 12000 callsFor small packets, RPC overhead dominatesFor large packets, data transmission times dominateThe time not from the local call is due to the RPC overheadPerformance cont.Handles simple calls that are frequent really wellWith more complicated calls, the performance doesn’t scale so wellRPC more expensive for sending large amounts of data than other procedures since RPC sends more packetsPerformance cont.Able to achieve transfer rate equal to a byte stream implementation if various parallel processes are interleavedExporting/Importing costs unmeasuredRPCRuntime RecapGoal: implement RPC efficientlyHope is to make possible applications that couldn’t previously make use of distributed computingIn general, strong performance numbers“Performance of Firefly RPC”Michael Schroeder and Michael BurrowsRPC gained relatively wide acceptanceSee just how well RPC performsAnalyze where latency creeps into RPCNote: Firefly designed by Andrew BirrellRPC Implementation on FireflyRPC is primary communication paradigm in FireflyUsed for inter-machine communicationAlso used for communication within a machine (not optimized… come to the next class to see how to do this)Stubs automatically generatedUses Modula2+ codeFirefly System5 MicroVAX II CPUs (1 MIPS each)16 MB shared memory, coherent cacheOne processor attached to Qbus10 Mb/s EthernetNub: system kernelStandard MeasurementsNull procedureNo arguments and no resultsMeasures base latency of RPC mechanismMaxResult, MaxArg proceduresMeasures throughput when sending the maximum size allowable in a packet (1514 bytes)Latency and ThroughputLatency and ThroughputThe base latency of RPC is 2.66 ms7 threads can do 741 calls/secLatency for Max is 6.35 ms4 threads can achieve 4.65 Mb/secData transfer rate in application since data transfers use RPCMarshaling TimeAs expected, scales linearly with size and number of arguments/resultsExcept when library code is called…0100200300400500600700NI L 1 128MarshalingTimeAnalysis of PerformanceSteps in fast path (95% of RPCs)Caller: obtains buffer, marshals arguments, transmits packet and waits (Transporter)Server: unmarshals arguments, calls server procedure, marshals results, sends resultsClient: Unmarshals results, free packetTransporterFill in RPC header in call packetSender fills in other headersSend packet on Ethernet (queue it, read it from memory, send it from CPU
View Full Document