Berkeley COMPSCI 262A - Speculative Execution in a Distributed File System

Unformatted text preview:

Speculative Execution in a Distributed File SystemEdmund B. Nightingale, Peter M. Chen, and Jason FlinnDepartment of Electrical Engineering and Computer ScienceUniversity of MichiganABSTRACTSpeculator provides Linux kernel support for speculative exe-cution. It allows multiple processes to share speculative stateby tracking causal dependencies propagated through inter-process communication. It guarantees correct execution bypreventing speculative processes from externalizing output,e.g., sending a network message or writing to the screen,until the speculations on which that output depends haveproven to be correct. Speculator improves the performance ofdistributed file systems by masking I/O latency and increas-ing I/O throughput. Rather than block during a remote oper-ation, a file system predicts the operation’s result, then usesSpeculator to checkpoint the state of the calling process andspeculatively continue its execution based on the predicted re-sult. If the prediction is correct, the checkpoint is discarded;if it is incorrect, the calling process is restored to the check-point, and the operation is retried. We have modified theclient, server, and network protocol of two distributed filesystems to use Speculator. For PostMark and Andrew-stylebenchmarks, speculative execution results in a factor of 2performance improvement for NFS over local-area networksand an order of magnitude improvement over wide-area net-works. For the same benchmarks, Speculator enables theBlue File System to provide the consistency of single-copyfile semantics and the safety of synchronous I/O, yet stilloutperform current distributed file systems with weaker con-sistency and safety.General TermsPerformance, DesignCategories and Subject DescriptorsD.4.3 [Operating Systems]: File Systems Management—Distributed file systems; D.4.7 [Operating Systems]: Or-ganization and Design; D.4.8 [Operating Systems]: Per-formancePermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SOSP’05, October 23–26, 2005, Brighton, United Kingdom.Copyright 2005 ACM 1-59593-079-5/05/0010 ...$5.00.KeywordsDistributed file systems, speculative execution, causality1. INTRODUCTIONDistributed file systems often perform substantially worsethan local file systems because they perform synchronousI/O operations for cache coherence and data safety. File sys-tems such as AFS [13] and NFS [3] present users with theabstraction of a single, coherent namespace shared acrossmultiple clients. Although caching data on local clients im-proves performance, many file operations still use synchro-nous message exchanges between client and server to main-tain cache consistency and protect against client or serverfailure. Even over a local-area network, the performanceimpact of this communication is substantial. As latency in-creases due to physical distance, middleboxes, and routingdelays, the performance cost may become prohibitive.Many distributed file systems weaken consistency andsafety to improve performance. Whereas local file systemstypically guarantee that a process that reads data from afile will see all modifications previously completed by otherprocesses, distributed file systems such as AFS and NFSprovide no such guarantee. For example, most NFS imple-mentations provide close-to-open consistency, which guar-antees only that a client that opens a file will see modi-fications made by other clients that have previously closedthe file. Weaker consistency semantics improve performanceby reducing the number of synchronous messages that areexchanged. Nevertheless, as our results show, even theseweaker semantics are time-consuming.We demonstrate that, with operating system support forlightweight checkpointing, speculative execution, and track-ing of causal interdependencies between processes, distrib-uted file systems can be fast, safe, and consistent. Ratherthan block a process while waiting for the result of a re-mote communication with a file server, the operating systemcheckpoints its state, predicts the result of the communica-tion, and continues to execute the process speculatively. Ifthe prediction is correct, the checkpoint is discarded; if it isfalse, the application is rolled back to the checkpoint.Our solution relies on three observations. First, file systemclients can correctly predict the result of many operations.For instance, consistency checks seldom fail since concurrentfile updates are rare. Second, the time to take a lightweightcheckpoint is often much less than network round-trip timeto the server, so substantial work can be done while waitingfor a remote request to complete. Finally, modern comput-ers often have spare resources that can be used to execute191Modify AWriteClient 1 Client 2ServerOpen BGetattrCommitModify BWriteCommitOpen CGetattrModify AClient 1 Client 2ServerModify BspeculateOpen CGetattrOpen BspeculateOpen BGetattrspeculateWrite+Commit(a) Unmodified NFS (b) Speculative NFSFigure 1: Example of speculative execution for NFSprocesses speculatively. Encouraged by these observations,and by the many prior successful applications of speculationin processor design, we have added support for speculativeexecution, which we call Speculator, to the Linux kernel.In our work, the distributed file system controls whenspeculations start, succeed, and fail. Speculator provides amechanism for correct execution of speculative code. It doesnot allow a process that is executing speculatively to exter-nalize output, e.g., make network transmissions or displayoutput to the screen, until the speculations on which thatoutput depends prove to be correct. If a speculative processtries to execute a potentially unrecoverable operation, e.g.,it calls the reboot system call, it is blocked until its specu-lations are resolved. Speculator tracks causal dependenciesbetween kernel objects in order to share speculative stateamong multiple processes. For instance, if a speculative pro-cess sends a signal to its non-speculative parent, Speculatorcheckpoints the parent and marks it as speculative before itdelivers the signal. If a speculation on which the child de-pends fails, both the child and


View Full Document

Berkeley COMPSCI 262A - Speculative Execution in a Distributed File System

Download Speculative Execution in a Distributed File System
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Speculative Execution in a Distributed File System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Speculative Execution in a Distributed File System 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?