Automated Hoarding for Mobile Computers Geoffrey H Kuenning and Gerald J Popekt Abstract I latter requires more expertise and involvement that most users are willing to offer We have taken a fresh approach to this problem and have suce ceeded in creating a predictive hoarding system called SEER that makes hoarding decisions without user interaction SEER conslders the user s activities to be composed of projects rather than lndlvidual files which greatly enhances the accuracy of its predictions In daily use the system has dramatically improved the achievable quality of hoarding decisions in general requiring a hoard that is only slightly larger than the working set A common problem facing mobile computing is connected operation or computing in the absence of a network Hoarding eases disconnected operation by selecting a subset of the user s files forlocal storage We describe a hoarding system that can operate without user intervention by observing user activity and predicting future needs The system calculates a new measure semantic distance between individual liles and uses this to feed a clustering algorithm that chooses which files should be hoarded A separatereplication system manages the actual transport of data any of a number of replication systems may be used We discuss practical problems encountered in the real world and present usage statistics showing that our system outperforms previous approaches by factors that can exceed 1O l 1 2 System Overview Automated predictive hoarding is based on the idea that a system can observe user behavior make inferences about the semantic relationships between files and use those inferences to aid the user SEER consists of two major components built atop a replication substrate First an observer watches the user s behavior and Ale accesses classifying each access according to type converting pathnames to absolute format and feeding the results to a correlator The correlator evaluates the file references calculating the semantic distances among various files see Section 3 I These semantic dlstances drive a clustering algorithm Section 3 3 2 that assigns each file to one or more projects When new hoard contents are to be chosen the correlator examines the projects to find those that are currently active and selects the highest priority projects until the maximum hoard size is reached Only complete projects are hoarded under the assumption that partial projects are not sufficient to make progress SEER does not itself do the file hoarding instead an underlying replication system performs this task This design frees SEER from the troublesome details of moving files back and forth between computers making sure updates are propagated to other replicas of the files and managing conflicts 17 j It also makes SEER more portable because very little is assumed about the underlying system SEER currently runs atop the RUMOR 6 181user level repllcation system acustom built master slave replication service called CHEAP RUMOR and CODA ll and it could easily be used with other systems such as FICUS 7 and LITTLE WORK 9 A feature critical to usability is that unlike previous systems SEERnormally operates without user intervention There is no need to build explicit lists of important files or to instruct the system that certain activities are of interest The only user interaction beyond any that might be required by the underlying replication system involves informing the computer that a disconnection is imminent and even this requirement can be eliminated by automated perlodlc hoard filling if desired Although SEER allows users to provlde cxelicit hoarding instructions our experience shows that such intervention is rarely necessary Introduction The face of computing today is rapidly being changed by the advent of mobility but the utility of the portable computer is seriously challenged by the problem of discohected operation where useful work must continue in the absence or near absence i e available only at high cost or low bandwidth of the network Although impressive resources are being devoted to research in wireless networking with a goal of making communication continuously available the problem is difficult and it is likely to be along time before the mobile user will have the same networking capabilities as we expect from a stationary computer today In the interim portable computers will often find themselves either completely lacking communication or significantly restricted by battery power bandwidth or cost In the absence of readily available high quality communication users are often forced to operate disconnected from the network But in a world dominated by networking this is a major drawback because the computing paradigm has grown dependent on the availability of non local resources Lack of access to a remote file can halt work on a particular task or even make the computer unusable A very attractive solution to the lack of communication is hoarding in which non local files are cached on the local disk prior to disconnection The local files can be managed and kept consistent by a replication system 7 9 11 The difficult challenge is the hoarding problem of selecting wlrich files should be stored locally Earlier solutions have simply chosen the most recently referenced files l 91 or asked the user to participate at least peripherally in managing hoard contents 11 211 The former approach is wasteful of scarce hoard space while the This work was partially supportedby the DefenseAdvancedResearch ProjectsAgency under contractN00174 91 C 0107 t The authorsare affiliatedwith the ComputerScienceDepartment Universityof California Los Angeles GeraldPopekis also affiliatedwith Platinum technology E mail geoff fing cs ucla edu popek platinum com Permission to make digital hard copy of part or all this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage the copyright notice the title of the publication and its date appear and notice is given that copying is by permission of ACM Inc To copy otherwise to republish to post on servers or to redistribute to lists requires prior specific permission and or a fee SOSP 16 10197 Saint Malo France 0 1997 ACM 0 89791 916 5 97 0010 3 50 264 3 Underlying Concepts The fundamental assumption of SEER is that there is semantic locaky in user behavior By detecting and exploiting this locality a system can make inferences about the
View Full Document