DOC PREVIEW
UW-Madison CS 739 - Manageability, availability and performance in Porcupine: a highly scalable

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A. Arpaci-Dusseau Department of Computer ScienceCS739: Distributed Systems University of Wisconsin, MadisonManageability, availability and performance in Porcupine: ahighly scalable, cluster-based mail service – SOSP’991 Introduction• What were the goals of the Porcupine mail server?• At a high-level, what are four features of the Porcupine system that enable it to meet thesethree goals?• Previous related work investigated building scalable web and proxy servers froms clusters.What is more challenging about mail as a service? What two options for placing data (andthe corresponding work) were previously investigated for clusters? What are the problemswith these approaches?2 System architecture overview• One of the keys of Porcupine is that it differentiates hard and soft state. What is the definitionof each? What is the benefit of differentiating?• What are each of the different Porcupine data structures? are they hard or soft state? whereis the data stored (is it replicated)?• Can you walk through Figure 2?• How does someone send mail to a user hosted by Porcupine?• How does a user retrieve messages from Porcupine? What happens if a node holding amailbox fragment is unavailable? What happens if a user manager is down?3 Self managementA primary goal of Porcupine is to deal automatically with diverse changes, including node failure,recovery, and addition.• How does Porcupine determine which nodes are currently part of the service (i.e., how doesthe Three Round Membership Protocol work)? Why are Lamport clocks used by the mem-bership protocol?1• What different events trigger Porcupine to run the membership protocol? Is it possible forthe cluster to be partitioned into multiple groups? How will this look to the user? Can a nodebelieve it is part of group, but it is not? What will happen?• Do you think the TRM protocol is a good match for Porcupine?• How is user management assigned to nodes of the system? What is the goal when performingthis assignment?• The user manager node is responsible for two pieces of soft state: message fragment list andthe user profile soft state. How is the message fragment list reconstructed? How is the userprofile soft state reconstructed?• How does Porcupine decide which node is responsible for the user profile database itself(hard state)?• When a new node is added, how does it get used? What data is allocated to it?4 Replication and availability• Porcupine replicates the hard state of user database and mailbox fragments to improve avail-ability. In updating the replicas, Porcupine leverages weaker semantics that are specific tomail delivery services. For example, the same message may be received more than once, amessage that was deleted may temporarily reappear, and multiple agents acting for the sameuser may have different views at the same time. Making these assumptions simplifies sys-tem design, while improving availability and performance. How can each of these odd casesoccur?• Why are wall clocks, instead of Lamport clocks, used to synchronize updates to the repli-cated user database?• Assuming no failures, what is the protocol for updating a replicated object? What is thepurpose of the log? Why would a peer need to keep a log too? What is the performanceproblem with keeping a log? (At what point may the coordinator respond to the initiatingagent??)• If a node with a replicated mailbox fragment disappears, how is another replica made???5 Dynamic load balancing• Load balancing is performed at the level of individual message sends. How does a nodedetermine the load on another node? Is this a good approach for Porcupine?• What two tensions must be resolved when deciding where to place a new message? Howdoes Porcupine decide on which node to store a new message?26 System evaluation• How well does the performance of Porcupine scale up through 30 nodes? How much worsedoes replication perform? Why so much worse?• Why does dynamic load balancing perform significantly better than static load balancinggiven skew in the workload? Do you think the authors’ concern with high spread factors iswarranted?• Given a heterogeneous configuration (some machines have faster disks), why does dynamicload balancing help more?• Failure recovery seems to work, as shown in Figures 10 and 11.• Do you think they demonstrated availability, performance, and manageability? Are thereother experiments you would have liked to have seen?7 Conclusions• Great example of a system service tuned to its particular workload and assumptions. Tookadvantage of soft state and email semantics to simplify design, improve availability, perfor-mance, and


View Full Document

UW-Madison CS 739 - Manageability, availability and performance in Porcupine: a highly scalable

Download Manageability, availability and performance in Porcupine: a highly scalable
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Manageability, availability and performance in Porcupine: a highly scalable and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Manageability, availability and performance in Porcupine: a highly scalable 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?