New version page

Adaptive File Transfers for Diverse Environments

Upgrade to remove ads

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

IntroductionGoals and AssumptionsDesign OverviewThe ResourcesControl LoopAvoiding ContentionScheduling ResourcesResource OperationsStep A: Ordering Disk OperationsStep B: Choosing Network OperationsOptimization ExampleImplementationSoftware ComponentsDisk Operation CPC ParametersCPC for CACHE OperationsCPC for HASH OperationCPC for STAT OperationEvaluationMethoddsync works in many conditionsSingle receiverMultiple receivers, homogeneous initial statesMultiple receivers, heterogeneous initial statesMultiple receivers in the wild (PlanetLab)dsync's back-pressure is effectiveGigabit networkSlow diskDynamic adaptation to changing loaddsync effectively uses the local diskReal workloadRelated WorkConclusionTo appear in Proceedings of the 2008 USENIX Annual TechnicalConference (USENIX’08), Boston, Massachusetts, June 2008Adaptive File Transfers for Diverse EnvironmentsHimabindu Pucha∗, Michael Kaminsky‡, David G. Andersen∗, and Michael A. Kozuch‡∗Carnegie Mellon University and‡Intel Research PittsburghAbstractThis paper presents dsync, a file transfer system that candynamically adapt to a wide variety of environments.While many transfer systems work well in their special-ized context, their performance comes at the cost of gen-erality, and they perform poorly when used elsewhere. Incontrast, dsync adapts to its environment by intelligentlydetermining which of its available resources is the bestto use at any given time. The resources dsync can drawfrom include the sender, the local disk, and network peers.While combining these resources may appear easy, inpractice it is difficult because these resources may havewidely different performance or contend with each other.In particular, the paper presents a novel mechanism thatenables dsync to aggressively search the receiver’s localdisk for useful data without interfering with concurrentnetwork transfers. Our evaluation on several workloadsin various network environments shows that dsync outper-forms existing systems by a factor of 1.4 to 5 in one-to-oneand one-to-many transfers.1 IntroductionFile transfer is a nearly universal concern among com-puter users. Home users download software updates andupload backup images (or delta images), researchers of-ten distribute files or file trees to a number of machines(e.g. conducting experiments on PlanetLab), and enter-prise users often distribute software packages to cluster orclient machines. Consequently, a number of techniqueshave been proposed to address file transfer, includingsimple direct mechanisms such as FTP, “swarming” peer-to-peer systems such as BitTorrent [4], and tools such asrsync [22] that attempt to transfer only the small deltaneeded to re-create a file at the receiver.Unfortunately, these systems fail to deliver optimal per-formance due to two related problems. First, the solutionstypically focus on one particular resource strategy to theexclusion of others. For example, rsync’s delta approachwill accelerate transfers in low-bandwidth environmentswhen a previous version of the file exists in the currentdirectory, but not when a useful version exists in a sib-ling directory or, e.g.,/tmp. Second, existing solutionstypically do not adapt to unexpected environments. Asan example, rsync, by default, always inspects previousfile versions to “accelerate” the transfer—even on fastnetworks when such inspections contend with the writeportion of the transfer and degrade overall performance.This paper presents dsync, a file(tree) transfer tool thatovercomes these drawbacks. To address the first prob-lem, dsync opportunistically uses all available sourcesof data: the sender, network peers, and similar data onthe receiver’s local disk. In particular, dsync includes aframework for locating relevant data on the local disk thatmight accelerate the transfer. This framework includesa pre-computed index of blocks on the disk and is aug-mented by a set of heuristics for extensively searchingthe local disk when the cache is out-of-date. dsync ad-dresses the second problem, the failure of existing filetransfer tools to accommodate diverse environments, byconstantly monitoring resource usage and adapting whennecessary. For example, dsync includes mechanisms tothrottle the aggressive disk search if either the disk orCPU becomes a bottleneck in accepting data from thenetwork.Those two principles, opportunistic resource usage andadaptation, enable dsync to avoid the limitations of pre-vious approaches. For example, several peer-to-peer sys-tems [6,1,4,11,19] can efficiently “swarm” data to manypeers, but do not take advantage of the local filesystem(s).The Low Bandwidth File System [13] can use all simi-lar content on disk, but must maintain an index and canonly transfer data from one source. When used in batchmode, rsync’s diff file can be sent over a peer-to-peersystem [11], but in this case, all hosts to be synchronizedmust have identical initial states.dsync manages three main resources: the network, thedisk, and CPU. The network (which we assume can pro-vide all of the data) is dsync’s primary data source, butdsync can choose to spend CPU and disk bandwidth tolocate relevant data on the local filesystem. However,dsync also needs these CPU resources to process incom-ing packets and this disk bandwidth to write the file topermanent storage. Therefore, dsync adaptively deter-mines at each step of the transfer which of the receiver’slocal resources can be used without introducing unduecontention by monitoring queue back-pressure. For ex-ample, dsync uses queue information from its disk writerand network reader processes to infer disk availability(Section 4). When searching the receiver’s disk is viable,dsync must continuously evaluate whether to identify ad-ditional candidate files (by performing directorystatoperations) or to inspect already identified files (by read-ing and hashing the file contents). dsync prioritizes avail-able disk operations using a novel cost/benefit frameworkwhich employs an extensible set of heuristics based onfile metadata (Section 5.2). As disk operations are sched-uled, dsync also submits data chunk transfer requests tothe network using the expected arrival time of each chunk(Section 5.3).Section 6 presents the implementation of dsync, andSection 7 provides an extensive evaluation, demonstratingthat dsync performs well in a wide range of operatingenvironments. dsync achieves performance near that ofexisting tools on the workloads


Download Adaptive File Transfers for Diverse Environments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Adaptive File Transfers for Diverse Environments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Adaptive File Transfers for Diverse Environments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?