DOC PREVIEW
CORNELL CS 514 - Lecture Slides

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS514: Intermediate Course in Operating SystemsProfessor Ken BirmanVivek Vishnumurthy: TAScalability Today we’ll focus on how things scale Basically: look at a property that matters Make something “bigger” Like the network, the number of groups, the number of members, the data rate Then measure the property and see impact Often we can “hope” that no slowdown would occur. But what really happens?Stock Exchange Problem: Sometimes, someone is slow…Most members are healthy….… but one is slowi.e. something is contending with the red process,delaying its handling of incoming messages…With a slow receiver, throughput collapses as the system scales up0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9050100150200250Virtually synchronous Ensemble multicast protocolsperturb rateaverage throughput on nonperturbed membersgroup size: 32group size: 64group size: 963296Why does this happen? Superficially, because data for the slow process piles up in the sender’s buffer, causing flow control to kick in (prematurely) But why does the problem grow worse as a function of group size, with just one “red” process? Small perturbations happen all the timeBroad picture? Virtual synchrony works well under burstyloads And it scales to fairly large systems (SWX uses a hierarchy to reach ~500 users) From what we’ve seen so far, this is about as good as it gets for reliability Recall that stronger reliability models like Paxosare costly and scale far worse Desired: steady throughput under heavy load and stress2Protocols famous for scalability Scalable reliable multicast (SRM) Reliable Multicast Transport Protocol (RMTP) On-Tree Efficient Recovery using Subcasting(OTERS) Several others: TMP, MFTP, MFTP/EC... But when stability is tested under stress, every one of these protocols collapses just like virtual synchrony!Example: Scalable Reliable Multicast (SRM) Originated in work on Wb and Mbone Idea is to do “local repair” if messages are lost, various optimizations keep load low and repair costs localized Wildly popular for internet “push,” seen as solution for Internet radio and TV But receiver-driven reliability model lacks “strong” reliability guaranteesLocal Repair Concept Local Repair ConceptLocal Repair ConceptlostLocal Repair ConceptNACK NACKNACKXReceipt of subsequent packet triggers NACK for missing packet3Local Repair ConceptNACKNACKXXXNACKReceive useless NAK, duplicate repairRetransmitLocal Repair ConceptXXXXXXNACKNACKReceive useless NAK, duplicate repairXLocal Repair ConceptNACKXXXXNACKNACKReceive useless NAK, duplicate repairLocal Repair ConceptXXXReceive useless NAK, duplicate repairLimitations? SRM runs in application, not router, hence IP multicast of nack’s and retransmissions tend to reach many or all processes Lacking knowledge of who should receive each message, SRM has no simple way to know when a message can be garbage collected at the application layer Probabilistic rules to suppress duplicatesIn practice? As the system grows large the “probabilistic suppression” fails More and more NAKs are sent in duplicate And more and more duplicate data message are sent as multiple receivers respond to the same NAK Why does this happen?4Visualizing how SRM collapses Think of sender as the hub of a wheel Messages depart in all directions Loss can occur at many places “out there” and they could be far apart… Hence NAK suppression won’t work Causing multiple NAKS And the same reasoning explains why any one NAK is likely to trigger multiple retransmissions! Experiments have confirmed that SRM overheads soar with deployment size Every message triggers many NAKs and many retransmissions until the network finally melts downDilemma confronting developers Application is extremely critical: stock market, air traffic control, medical system Hence need a strong model, guarantees But these applications often have a soft-realtime subsystem Steady data generation May need to deliver over a large scaleToday introduce a new design pt. Bimodal multicast (pbcast) is reliable in a sense that can be formalized, at least for some networks Generalization for larger class of networks should be possible but maybe not easy Protocol is also very stable under steady load even if 25% of processes are perturbed Scalable in much the same way as SRMEnvironment Will assume that most links have known throughput and loss properties Also assume that most processes are responsive to messages in bounded time But can tolerate some flakey links and some crashed or slow processes.Start by using unreliable multicast to rapidly distribute the message. But some messages may not get through, and some processes may be faulty. So initial state involves partial distribution of multicast(s)Periodically (e.g. every 100ms) each process sends a digest describing its state to some randomly selected group member. The digest identifies messages. It doesn’t include them.5Recipient checks the gossip digest against its own history and solicits a copy of any missing message from the process that sent the gossipProcesses respond to solicitations received during a round of gossip by retransmitting the requested message. The round lasts much longer than a typical RPC time.Delivery? Garbage Collection? Deliver a message when it is in FIFO order Garbage collect a message when you believe that no “healthy” process could still need a copy (we used to wait 10 rounds, but now are using gossip to detect this condition) Match parameters to intended environmentNeed to bound costs Worries: Someone could fall behind and never catch up, endlessly loading everyone else What if some process has lots of stuff others want and they bombard him with requests? What about scalability in buffering and in list of members of the system, or costs of updating that list?Optimizations Request retransmissions most recent multicast first Idea is to “catch up quickly” leaving at most one gap in the retrieved sequenceOptimizations Participants bound the amount of data they will retransmit during any given round of gossip. If too much is solicited they ignore the excess requests6Optimizations Label each gossip message with senders gossip round number Ignore solicitations that have expired round number, reasoning that they arrived very late hence are


View Full Document

CORNELL CS 514 - Lecture Slides

Documents in this Course
LECTURE

LECTURE

29 pages

LECTURE

LECTURE

28 pages

Load more
Download Lecture Slides
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Slides and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Slides 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?