Manageability Availability and Performance in Porcupine A Highly Scalable Cluster Based Mail Service YASUSHI SAITO BRIAN N BERSHAD and HENRY M LEVY University of Washington This paper describes the motivation design and performance of Porcupine a scalable mail server The goal of Porcupine is to provide a highly available and scalable electronic mail service using a large cluster of commodity PCs We designed Porcupine to be easy to manage by emphasizing dynamic load balancing automatic configuration and graceful degradation in the presence of failures Key to the system s manageability availability and performance is that sessions data and underlying services are distributed homogeneously and dynamically across nodes in a cluster Categories and Subject Descriptors C 2 4 Computer Communication Networks Distributed Systems Distributed applications C 4 Performance of Systems Reliability Availability and Serviceability C 5 5 Computer System Implementation Servers D 4 5 Operating Systems Reliability Fault tolerance H 3 4 Information Storage and Retrieval Systems and Software Distributed systems H 4 3 Information Storage and Retrieval Communications Applications Electronic mail General Terms Algorithms Performance Management Reliability Additional Key Words and Phrases Distributed systems email cluster group membership protocol replication load balancing 1 INTRODUCTION The growth of the Internet has led to the need for highly scalable and highly available services This paper describes the Porcupine scalable electronic mail service Porcupine achieves scalability by clustering many small machines PCs enabling them to work together in an efficient This work is supported by DARPA Grant F30602 97 2 0226 and by National Science Foundation Grant EIA 9870740 An earlier version appeared at the 17th ACM Symposium on Operating Systems Principles SOSP Kiawah Island Resort SC Dec 1999 The Porcupine project web page is at http porcupine cs washington edu Authors address MBOX 352350 Department of Computer Science and Engineering University of Washington Seattle WA 98195 email yasushi bershad levy cs washington edu Permission to make digital hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage the copyright notice the title of the publication and its date appear and notice is given that copying is by permission of the ACM Inc To copy otherwise to republish to post on servers or to redistribute to lists requires prior specific permission and or a fee 2000 ACM 0734 2071 00 0800 0298 05 00 ACM Transactions on Computer Systems Vol 18 No 3 August 2000 Pages 298 332 Manageability Availability and Performance in Porcupine 299 manner In this section we describe system requirements for Porcupine and relate the rationale for choosing a mail application as our target 1 1 System Requirements Porcupine defines scalability in terms of three essential system aspects manageability availability and performance Requirements for each follow 1 Manageability requirements Although a system may be physically large it should be easy to manage In particular the system must self configure with respect to load and data distribution and self heal with respect to failure and recovery A system manager can simply add more machines or disks to improve throughput and replace them when they break Over time a system s nodes will perform at differing capacities but these differences should be masked and managed by the system 2 Availability requirements With so many nodes it is likely that some will be down at any given time Despite component failures the system should deliver good service to all of its users at all times In practice the failure of one or more nodes may prevent some users from accessing some of their mail However we strive to avoid failure modes in which whole groups of users find themselves without any mail service for even a short period 3 Performance requirements Porcupine s single node performance should be competitive with other single node systems its aggregate performance should scale linearly with the number of nodes in the system For Porcupine we target a system that scales to hundreds of machines which is sufficient to service a few billion mail messages per day with today s commodity PC hardware and system area networks Porcupine meets these requirements uniquely The key principle that permeates the design of Porcupine is functional homogeneity That is any node can execute part or all of any transaction e g for the delivery or retrieval of mail Based on this principle Porcupine uses three techniques to meet our scalability goals First every transaction is dynamically scheduled to ensure that work is uniformly distributed across all nodes in the cluster Second the system automatically reconfigures whenever nodes are added or removed even transiently Third system and user data are automatically replicated across a number of nodes to ensure availability Figure 1 shows the relationships among our goals and key features or techniques used in the system For example dynamic scheduling and automatic reconfiguration make the system manageable since changes to the size or the quality of machines user population and workload are handled automatically Similarly automatic reconfiguration and replication improve availability by making email messages user profiles and other auxiliary data structures survive failures ACM Transactions on Computer Systems Vol 18 No 3 August 2000 300 Yasushi Saito et al Fig 1 The primary goal of Porcupine is scalability defined in terms of manageability availability and performance requirements In turn these requirements are met through combinations of the three key techniques shown above Today Porcupine runs on a cluster of 30 PCs connected by a high speed network although we show that it is designed to scale well beyond that Performance is linear with respect to the number of nodes in the cluster The system adapts automatically to changes in workload node capacity and node availability Data are available despite the presence of failures 1 2 Rationale for a Mail Application Although Porcupine is a mail system its underlying services and architecture are appropriate for other systems in which data are frequently written and where good performance availability and manageability at high volume are demanded For example Usenet news community bulletin boards and large scale
View Full Document
Unlocking...