UCF COT 4810 - Replication for Web Hosting System

Unformatted text preview:

Replication for Web Hosting SystemsSWAMINATHAN SIVASUBRAMANIAN, MICHAL SZYMANIAK, GUILLAUME PIERRE,and MAARTEN VAN STEENVrije Universiteit, AmsterdamReplication is a well-known technique to improve the accessibility of Web sites. Itgenerally offers reduced client latencies and increases a site’s availability. However,applying replication techniques is not trivial, and various Content Delivery Networks(CDNs) have been created to facilitate replication for digital content providers. Thesuccess of these CDNs has triggered further research efforts into developing advancedWeb replica hosting systems. These are systems that host the documents of a websiteand manage replication automatically. To identify the key issues in designing awide-area replica hosting system, we present an architectural framework. Theframework assists in characterizing different systems in a systematic manner. Wecategorize different research efforts and review their relative merits and demerits. Asan important side-effect, this review and characterization shows that there a number ofinteresting research questions that have not received much attention yet, but whichdeserve exploration by the research community.Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]:Distributed Systems; C.4 [Performance of Systems]: Design Studies; H.3.5[Information Storage and Retrieval]: Online Information Service—Web-basedservicesGeneral Terms: Design, Measurement, PerformanceAdditional Key Words and Phrases: Web replication, content delivery networks1. INTRODUCTIONReplication is a technique that allows toimprove the quality of distributed ser-vices. In the past few years, it has beenincreasingly applied to Web services, no-tably for hosting Web sites. In such cases,replication involves creating copies of asite’s Web documents, and placing thesedocument copies at well-chosen locations.In addition, various measures are takento ensure (possibly different levels of) con-sistency when a replicated document isAuthors’ address: Department of Computer Science, Vrije Universiteit, Amsterdam, The Netherlands; email:{swami,michal,gpierre,steen}@cs.vu.nlPermission to make digital/hard copy of part or all of this work for personal or classroom use is grantedwithout fee provided that the copies are not made or distributed for profit or commercial advantage, thecopyright notice, the title of the publication, and its date appear, and notice is given that copying is bypermission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requiresprior specific permission and/or a fee.c2004 ACM 0360-0300/04/0900-0291 $5.00updated. Finally, effort is put into redi-recting a client to a server hosting adocument copy such that the client is opti-mally served. Replication can lead to re-duced client latency and network trafficby redirecting client requests to a replicaclosest to that client. It can also improvethe availability of the system, as the fail-ure of one replica does not result in entireservice outage.These advantages motivate many Webcontent providers to offer their ser-vices using systems that use replicationACM Computing Surveys, Vol. 36, No. 3, September 2004, pp. 291–334.292 S. Sivasubramanian et al.techniques. We refer to systems providingsuch hosting services as replica hostingsystems. The design space for replica host-ing systems is big and seemingly complex.In this article, we concentrate on organiz-ing this design space and review severalimportant research efforts concerning thedevelopment of Web replica hosting sys-tems. A typical example of such a system isa Content Delivery Network (CDN) [Hull2002; Rabinovich and Spastscheck 2002;Verma 2002].There exists a wide range of articles dis-cussing selected aspects of Web replica-tion. However, to the best of our knowl-edge, there is no single framework thataids in understanding, analyzing and com-paring the efforts conducted in this area.In this article, we provide a frameworkthat covers the important issues that needto be addressed in the design of a Webreplica hosting system. The framework isbuilt around an objective function–a gen-eral method for evaluating the system per-formance. Using this objective function,we define the role of the different systemcomponents that address separate issuesin building a replica hosting system.The Web replica hosting systems weconsider are scattered across a large ge-ographical area, notably the Internet.When designing such a system, at least thefollowing five issues need to be addressed:(1) How do we select and estimate the met-rics for taking replication decisions?(2) When do we replicate a given Web doc-ument?(3) Where do we place the replicas of agiven document?(4) How do we ensure consistency of allreplicas of the same document?(5) How do we route client requests to ap-propriate replicas?Each of these five issues is to a large ex-tent independent from the others. Oncegrouped together, they address all theissues constituting a generalized frame-work of a Web replica hosting system.Given this framework, we compare andcombine several existing research efforts,and identify problems that have not beenaddressed by the research communitybefore.Another issue that should also be ad-dressed separately is selecting the objectsto replicate. Object selection is directlyrelated to the granularity of replication.In practice, whole websites are taken asthe unit for replication, but Chen et al.[2002b, 2003] show that grouping Webdocuments can considerably improve theperformance of replication schemes at rel-atively low costs. However, as not muchwork has been done in this area, we havechosen to exclude object selection from ourstudy.We further note that Web caching isan area closely related to replication. Incaching, whenever a client requests adocument for the first time, the clientprocess or the local server handling therequest will fetch a copy from the doc-ument’s server. Before passing it to theclient, the document is stored locally ina cache. Whenever that document is re-quested again, it can be fetched from thecache locally. In replication, a document’sserver proactively places copies of docu-ment at various servers, anticipating thatenough clients will make use of this copy.Caching and replication thus differ only inthe method of creation of copies. Hence, weperceive caching infrastructures (like, e.g.,Akamai [Dilley et al. 2002]) also as replicahosting systems, as document


View Full Document

UCF COT 4810 - Replication for Web Hosting System

Documents in this Course
Spoofing

Spoofing

25 pages

CAPTCHA

CAPTCHA

18 pages

Load more
Download Replication for Web Hosting System
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Replication for Web Hosting System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Replication for Web Hosting System 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?