To appear in Mobile Networks and Applications (MONET), Special Issue on Wireless Sensor Networks, Kluwer, mid-2003.Data-Centric Storage in Sensornets withGHT, A Geographic Hash TableSylvia RatnasamyIntel Research, Berkeley, CA [email protected] KarpIntel Research / CMU, Pittsburgh, PA [email protected],[email protected] ShenkerICIR / ICSI, Berkeley, CA [email protected] EstrinUCLA Computer Science, LA, CA [email protected] GovindanUSC Computer Science, LA, CA [email protected] Yin and Fang YuUC Berkeley EECS, Berkeley, CA 94720{yinli,fyu}@eecs.berkeley.eduAbstractMaking effective use of the vast amounts of data gathered by large-scale sensor networks (sensornets) will require scalable, self-or-ganizing, and energy-efficient data dissemination algorithms. Forsensornets, where the content of the data is more important thanthe identity of the node that gathers them, researchers have foundit useful to move away from the Internet’s point-to-point commu-nication abstraction and instead adopt abstractions that are moredata-centric. This approach entails naming the data and using com-munication abstractions that refer to those names rather than tonode network addresses [1, 11]. Previous work on data-centricrouting has shown it to be an energy-efficient data disseminationmethod for sensornets [12]. Herein, we argue that a companionmethod, data-centric storage (DCS), is also a useful approach. Un-der DCS, sensed data are stored at a node determined by the nameassociated with the sensed data.In this paper, we first define DCS and predict analytically whereit outperforms other data dissemination approaches. We then de-scribe GHT, a Geographic Hash Table system for DCS on sensor-nets. GHT hashes keys into geographic coordinates, and stores akey-value pair at the sensor node geographically nearest the hashof its key. The system replicates stored data locally to ensure per-sistence when nodes fail. It uses an efficient consistency protocolto ensure that key-value pairs are stored at the appropriate nodesafter topological changes. And it distributes load throughout thenetwork using a geographic hierarchy. We evaluate the perfor-mance of GHT as a DCS system in simulation against two otherdissemination approaches. Our results demonstrate that GHT isthe preferable approach for the application workloads we analyti-cally predict, offers high data availability, and scales to large sen-sornet deployments, even when nodes fail or are mobile.1 IntroductionA sensornet is a distributed sensing network comprised ofa large number of small devices, each with some compu-tational, storage and communication capability. Such net-works can operate in an unattended mode to record detailedinformation about their surroundings. They are thus wellsuited to applications such as location tracking and habi-tat monitoring [5, 21]. As these networks scale in size, sowill the amount of data they make available. The great vol-ume of these data and the fact that they are spread acrossthe entire sensornet create the need for data-disseminationtechniques capable of extracting relevant data from withinthe sensornet. Moreover, communication between nodes re-quires the expenditure of energy, a scarce commodity formost sensornets. Thus, making effective use of sensor-net data will require scalable, self-organizing, and energy-efficient data dissemination algorithms.The utility of a sensornet derives primarily from the datait gathers; the identity of the individual sensor node thatrecords the data tends to be less relevant. Accordingly,sensornet researchers have argued for communication ab-stractions that are data-centric. Under this model, dataare “named” and communication abstractions refer to thesenames rather than to node network addresses [1, 11]. Thedirected diffusion [12] data-centric routing scheme has beenshown to be an energy-efficient data dissemination methodfor sensornet environments.Herein, we propose a useful companion method, data-centric storage (DCS). In DCS, relevant data are stored byname at nodes within the sensornet; all data with the samegeneral name (e.g., elephant sightings) will be stored at thesame sensornet node (not necessarily the one that originallygathered the data). Queries for data with a particular namecan then be sent directly to the node storing those nameddata, without the flooding required in some data-centricrouting proposals.Several data-centric dissemination methods are conceiv-able, each with different performance characteristics. Theappropriate data dissemination method for a task will de-pend on the nature of the sensornet, its intended deploy-ment environment, and the expected workload. We makefour principal contributions in this paper:1To appear in Mobile Networks and Applications (MONET), Special Issue on Wireless Sensor Networks, Kluwer, mid-2003.• We propose a novel data dissemination method, DCS,and show where it outperforms other approaches.• We provide an organizing framework for comparingamong three canonical data dissemination approaches,and predict where each performs best.• We identify design criteria for a robust and efficientDCS system, and describe GHT, a Geographic Hash Ta-ble system for DCS whose design is motivated by thesecriteria.• We evaluate GHT’s performance using detailed simu-lations of networks of up to 200 nodes, and abstractsimulations of networks of up to 100,000 nodes.Our claim is not that DCS is always the method of choice,but rather that under some conditions it will be the mostdesired option. In fact, we expect that future sensornets willembody all of these (or similar) data-centric disseminationmethods, and that users will choose the appropriate methodbased on the task at hand.To understand the relative behavior of each dissemina-tion method under different conditions, one must in turnunderstand the context in which these algorithms will be de-ployed. For this reason, we begin our paper with a descrip-tion (in Section 7) of the role played by data disseminationin a complete sensornet system. This material also providesthe needed context for later comparative simulations.Our scalable and robust DCS system, GHT, builds ontwo recent advances: (1) a new generation of efficient peer-to-peer lookup systems such as Pastry, CAN, Chord, andTapestry [7, 24, 26, 29] and (2) the GPSR geographic rout-ing algorithm [15]. In these peer-to-peer lookup systems, adata object is associated
View Full Document