New version page

Measurements of a Distributed File System

Upgrade to remove ads

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Measurements of a Distributed File SystemMary G. Baker, John H. Hartmart, Michael D, Kupfer, Ken W. Shirriff, and John K. OusterhoutComputer Science DivisionElectrical Engineering and Computer SciencesUniversity of California, Berkeley, CA 94720AbstractWe analyzed the user-level file access patterns and cachingbehavior of the Sprite distributed file system. The first partof our analysis repeated a study done in 1985 of the BSDUNIX file system. We found that file throughput hasincreased by a factor of 20 to an average of 8 Kbytes persecond per active user over 10-minute intervals, and thatthe use of process migration for load sharing increasedburst rates by another factor of six. Also, many more verylarge (multi-megabyte) files are in use today than in 1985.The second part of our analysis measured the behavior ofSprite’s main-memory file caches. Client-level cachesaverage about 7 Mbytes in size (about one-quarter to one-third of main memory) and filter out about 50% of thetraffic between clients and servers. 3570 of the remainingserver traffic is caused by paging, even on workstationswith large memories.We found that client cache con-sistencey is needed to prevent stale data errors, but that it isnot invoked often enough to degrade overall system perfor-mance.1. IntroductionIn1985 a group of researchers at the University ofCalifornia at Berkeley performed a trace-driven analysis ofthe UNIX 4.2 BSD file system [11]. That study, which wecall “the BSD study,”showed that average file accessrates were only a few hundred bytes per second per user forengineering and office applications, and that many files hadlifetimes of only a few seconds. It atso reinforcedcommonly-held beliefs that file accesses tend to be sequen-tial, and that most file accesses are to short files but themajority of bytes transferred belong to long files. Lastly, itused simulations to predict that main-memory tile caches ofa few megabytes could substantially reduce disk 1/0 (andThe work describd here was supported in part by the National Sci.ence Foundation under grant CCR-8!XXXY29,tbe National Aeronautics andSpace Adrninistration and the Defense Advanced Research Projects Agen-cy under contract NAG2-591, and an IBM Graduate Fellowship Award.Permission to copy without fee all or part of this material isgranted provided that the copies are not made or distributed fordirect commercial advantage, the ACM copyright notice and thetitle of the publication and its date appear, and notice is giventhat copying is by permission of the Association for ComputingMachinery. To copy otherwisa, or to republish, requires a feeand/or specific permission.~ 1991 ACM 0-89791 -447 -3/911000910198 ...S1 .50server traftic in a networked environment). The results ofthis study have been used to justify several network filesystem designs over the last six years.In this paper we repeat the analysis of the BSD studyand report additional measurements of file caching in a dis-tributed file system. Two factors motivated us to make thenew measurements. First, computing environments havechanged dramatically over the last six years, from rela-tively slow time-shared machines (VAX- 11/780s in theBSD study) to today’s much faster personal workstations.Second, several network-oriented operating systems andtile systems have been developed during the last decade,e.g. AFS [4], Amoeba [7], Echo [3], Locus [141, NFS [16],Sprite [9], and V [1]; they provide transparent network filesystems and, in some cases, the ability for a single user toharness many workstations to work on a single task. Giventhese changes in computers and the way they are used, wehoped to learn how tile system access patterns havechanged, and what the important factors are in designingfile systems for the future.We made our measurements on a collection of about40 1O-MIPS workstations all running the Sprite operatingsystem [9, 12]. Four of the workstations served as fileservers, and the rest were diskless clients. Our results arepresented in two groups. The first group of results parallelsthe analysis of the BSD study. We found that filethroughput per user has increased substantially (by at leasta factor of 20) and has also become more bursty. Ourmeasurements agree with the BSD study that the vastmajority of file accesses are to smalt files; however, largefiles have become an order of magnitude larger, so thatthey account for an increasing fraction of bytes transfemed.Many of the changes in our measurements can be explainedby these large files. In most other respects our measure-ments match those of the BSD study file accesses arelargely sequential, files are typically open for only a frac-tion of a second, and file lifetimes are shorLOur second set of results analyzes the main-memorytile caches in the Sprite system. Sprite’s file caches changesize dynamically in response to the needs of the file andvirtual memory systems we found substantial cache sizevariations over time on clients that had an average cachesize of about 7 Mbytes out of an average of 24 Mbytes ofmain memory.About 60% of all data bytes read by198applications are retrieved from client caches without con-tacting tile servers. Sprite’s 30-second delayed-write pol-icy allows about 1O$ZOof newly-written bytes to be deletedor overwritten without being written back from the clientcache to the server.Sprite guarantees the consistency of data cached ondifferent clients. We found that many users would beaffected if Sprite’s consistency guarantees were weaker,but write-sharing occurs infrequently enough that the over-heads of implementing consistency have little impact onaverage system performance. We compared Sprite’s con-sistency implementation with two other approaches andfound that even the best approach, a token-based mechan-ism, does not significantly reduce the consistency over-heads for our workload.Sprite allows users to take advantage of many works-tations simultaneously by migrating processes onto idlemachines. Process migration increased the burst rates offile throughput by a factor of six in comparison to overallfile throughput. Fortunately, we found that process migra-tion does not reduce the effectiveness of file caches.Migrated processes actually had higher cache hit ratios thannon-migrated processes, and process migration also had lit-tle impact on the cache consistency mechanism.The rest of the paper is structured as follows: Sec-tion2 describes the system that was measured and its work-load, and


Download Measurements of a Distributed File System
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Measurements of a Distributed File System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Measurements of a Distributed File System 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?