WSU CSE 6362 - The Network is the Database

Unformatted text preview:

The Network is the Database: Data Management for Highly Distributed Systems Julio C. Navas Siemens Technology-to-Business Center 1995 University Ave, Suite 375 Berkeley, California, 94704 [email protected] Michael Wynblatt Siemens Technology-to-Business Center 1995 University Ave, Suite 375 Berkeley, California, 94704 [email protected] ABSTRACT This paper describes the methodology and implementation of a data management system for highly distributed systems, which was built to solve the scalability and reliability problems faced in a wide area postal logistics application developed at Siemens. The core of the approach is to borrow from Internet routing protocols, and their proven scalability and robustness, to build a network-embedded dynamic database index, and to augment schema definition to optimize the use of this index. The system was developed with an eye toward future applications in the area of sensor networks. Categories and Subject Descriptors H.2.4 [Database Management]: Systems – distributed databases, query processing. H.4.4 [Information Systems Applications]: Miscellaneous. C.2.4 [Computer-Communication Networks]: Distributed Systems – distributed applications, distributed databases. General Terms Algorithms, Performance, Design. Keywords Wide-Area Data Management, Sensor Networks, Logistics, Distributed Data Management. 1. BACKGROUND & MOTIVATION The rapid decrease in the cost and size of data communications hardware and various sensor technologies offers to support a wealth of new businesses, loosely termed "sensor networks" [3,5]. Applications have been proposed in areas including intelligent highways, power grid management, intelligent battlefields, and remote product service and maintenance. Although the applications are varied, they share several common features: (1) a relatively large number of data sources (typically on the order of 105 or more), (2) relatively volatile data and data organization, and (3) and the requirement for "thin" data servers, to run on scaled down hardware. These applications offer a significant challenge for data management. Most application-level tools wish to treat such a collection of data as a traditional database, using well-known query languages to access the data. In the state-of-the-art of commercial systems, the solution to similar problems is to use a centralized database (or small collection of distributed databases), and collect the data as fast as possible, either through polling or event-driven reporting. Applications then act on the database in the traditional way. However, in applications requiring tens of thousands of data sources, with rapidly changing data, such systems are insufficient. The databases themselves, and especially their data communications channels, become bottlenecks that prevent the system from achieving these scales. Moreover, such databases represent critical failure points, and also introduce latency that may be relevant in real-time applications. A system is needed which is highly scalable, offers no critical failure points, and lets data flow from the source to the requestor as rapidly as possible. One research project addressing this space is COUGAR [1], but this system assumes a centralized index of all data sources, which does not address our scalability or critical failure point requirements. At Siemens, we see a great future potential in sensor networks applications, and in developing technology that will satisfy these very challenging constraints. More immediately, we have a need to support a related application in the area of postal logistics. This application has somewhat milder constraints (less than one hundred thousand data sources, and reasonably fat clients are acceptable), but offers many of the same challenges as the sensor networks. Our goal was to produce a solution that solved our immediate problem and has applicability to the more demanding problems we foresee in the near-term. The postal logistics application will serve as a guide for this paper, as we show examples from this domain to illustrate how our system operates. Section 2 describes this application. Section 3 describes a wide area data management system, and details specific to the implementation are described in Section 4. Our future work is described in Section 5. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SIGMOD 2001 May 21-24, Santa Barbara, California, USA Copyright 2001 ACM 1-58113-332-4/01/05 5.00. Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. ACM SIGMOD 2001 May 21-24, Santa Barbara, California USA Copyright 2001 ACM 1-58113-332-4/01/05…$5.00 5442. DATA MANAGEMENT IN WIDE AREA POSTAL LOGISTICS Figure 1. shows an overview of the application that we support. The idea is to allow a large courier service to treat its distribution and staging system of hubs, substations, trucks, and airplanes as a single live database of packages. Any authenticated client, from any point in the network, can issue a query against this database and receive the current answer. A typical scenario for this system is the following. A logistics agent (automated or human) located at an airport hub is faced with a partially loaded airplane scheduled to depart shortly. An optimization question arises: should the plane embark partially loaded, or await the arrival of additional packages? The agent issues a query to ask how many packages bound for the airplane's destination, and in what sizes, are


View Full Document

WSU CSE 6362 - The Network is the Database

Download The Network is the Database
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Network is the Database and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Network is the Database 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?