DOC PREVIEW
Heuristic Cleaning Algorithms in Log-Structured File Systems

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Heuristic Cleaning Algorithms in Log-Structured File Systems Trevor Blackwell, Jeffrey Harris, Margo SeltzerHarvard UniversityAbstractResearch results show that while Log-Structured File Systems (LFS) offer the potential fordramatically improved file system performance, thecleaner can seriously degrade performance, by asmuch as 40% in transaction processing workloads [9].Our goal is to examine trace data from live filesystems and use those to derive simple heuristics thatwill permit the cleaner to run without interfering withnormal file access. Our results show that trivialheuristics perform very well, allowing 97% of allcleaning on the most heavily loaded system westudied to be done in the background.1. IntroductionThe Log-Structured File System is a novel diskstorage system that performs all disk writes contigu-ously. Since this rids the system of seeks during writ-ing, the potential performance of such a system ismuch greater than in the standard Fast File System[4], which must make writes to several different loca-tions on the disk for common operations such as filecreation. The mechanism used by LFS to providesequential writing is to treat the disk as a log com-posed of a collection of large (one-half to one mega-byte) segments, each of which is written sequentially.New and modified data are appended to the end of thislog. Since this is an append-only system, all the seg-ments in the file system eventually become full. How-ever, as data are updated or deleted, blocks that residein the log become replaced or removed and theirspace can be reclaimed. This reclamation of space,gathering the freed blocks into clean segments, iscalled cleaning and is a form of generational garbagecollection [3]. The critical challenge for LFS in pro-viding high performance is to keep cleaning overheadlow, and more importantly, to ensure that I/Os associ-ated with cleaning do not interfere with normal filesystem activity. There are three terms that will be useful indiscussing LFS cleaner performance write cost, on-demand cleaning, and background cleaning.Rosenblum defines write cost as the average amountof time that the disk is busy per byte of new datawritten, including all the cleaning overheads [8]. Awrite cost of 1.0 is perfect meaning that data can bewritten at the full disk bandwidth and never touchedagain. A write cost of 10.0 means that writes areultimately performed at one-tenth the maximumbandwidth. A write cost above 1.0 indicates that datahad to be cleaned, that is, rewritten to anothersegment in order to reclaim space. Cleaning isperformed for one of two reasons: either the filesystem becomes full, in which case cleaning isrequired before more data can be written, or the filesystem becomes lightly utilized and can be cleanedwithout adversely affecting normal activity. We callthe first case when the cleaner is required to run, on-demand cleaning and the latter, optionally running thecleaner, background cleaning. Using these threeterms, we can restate the challenge of LFS asminimizing write cost and avoiding on-demandcleaning.The cleaning behavior of Sprite-LFS wasmonitored over a four-month period [8]. Themeasurements indicated that one-half of the segmentscleaned during a four month period were empty andthat the maximum write-cost observed in theirenvironment was 1.6. While this write cost isacceptably low, the results do not give an indication ofthe impact on file system latency that resulted fromcleaner I/Os interfering with user I/Os. Seltzer et al.report that cleaning overheads can be substantial, asmuch as 41% in a transaction processing environment[9]. However, the cleaning cost in a benchmarkingenvironment is an unrealistic indicator since thebenchmark is constantly demanding use of the filesystem. Unlike benchmark environments, the real-world behavior of most workstation environments isobserved to be bursty [1][5]. For example, consider anapplication that has two phases in which it executes.In phase 1, it creates and deletes many small files. Inphase 2, it computes or uses the network, orterminates. Examples of such applications includeSendmail and NNTP servers. In FFS, the writes forthe new data are all performed at the time the createsand deletes are issued. In LFS, all the small writes arebundled together into large, contiguous writes, sobandwidth utilization exceeds 50%, and approaches100% as file size increases. The cleaner can runduring the non-disk phase of the application when itdoes not interfere with application I/O. This workloadis diagrammed in Figure 1.The goal of this work is to investigate the realcost of cleaning, in terms of user-perceived latency.Using trace data gathered from three file servers, wehave found that simple heuristics enable us to removenearly all on-demand cleaning (specifically, on themost heavily loaded system, only 3.3% of thecleaning had to be done on-demand). The targetoperating environment of this study is a conventionalUNIX-based network of workstations where files arestored on one or more shared file servers. Theseresults do not necessarily hold for all environments;CPUNetworkDiskDiskTimeSync. WritesLogCleaningWritesFigure 1. Overlapping cleaning with non-diskactivity. The two traces depict FFS and LFSprocessing cycles. Both systems are writing the sameamount of data, but FFS must issue its writessynchronously, resulting in a much longer totalexecution time. LFS issues its write asynchronouslyand cleans during processing periods, thus achievingmuch shorter total execution time.LFSFFSCPUNetworkCleaningthey may not apply to online transaction processingenvironments where peak or near-peak transactionrates must be sustained at all times.Section 2 describes the environments in whichwe gathered our measurements, Section 3 describesthe traces we gathered, and Section 4 describes thetools and simulation model we used. Section 5presents the results of our simulations, and Section 6discusses the conclusions and ramifications of thiswork.2. Benchmarking MethodologyAll our trace data was collected by monitoringNFS packets to servers that provide only remoteaccess. The NFSWatch utility [10], present on manyUNIX systems, monitors all network traffic to an NFSfile server and categorizes it, periodically displayingthe number and percentage of packets received foreach type of NFS operation. We modified the tool tooutput a log record for every NFS request. SinceNFSWatch snoops on the ethernet, it does not capturelocal activity to a disk. To guarantee


Heuristic Cleaning Algorithms in Log-Structured File Systems

Download Heuristic Cleaning Algorithms in Log-Structured File Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Heuristic Cleaning Algorithms in Log-Structured File Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Heuristic Cleaning Algorithms in Log-Structured File Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?