DOC PREVIEW
HARVARD CS 263 - Dynamic Self-tuning Database for NAND Flash

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

FlashDB: Dynamic Self-tuning Database for NAND FlashSuman NathMicrosoft [email protected] KansalMicrosoft [email protected] is a self-tuning database optimized for sensor networksusing NAND flash storage. In practical systems flash is used in dif-ferent packages such as on-board flash chips, compact flash cards,secure digital cards and related formats. Our experiments revealnon-trivial differences in their access costs. Furthermore, databasesmay be subject to different types of workloads. We show that ex-isting databases for flash are not optimized for all types of flashdevices or for all workloads and their performance is thus subop-timal in many practical systems. FlashDB uses a novel self-tuningindex that dynamically adapts its storage structure to workload andunderlying storage device. We formalize the self-tuning nature ofan index as a two-state task system and propose a 3-competitiveonline algorithm that achieves the theoretical optimum. We alsoprovide a framework to determine the optimal size of an index nodethat minimizes energy and latency for a given device. Finally, wepropose optimizations to further improve the performance of ourindex. We prototype and compare different indexing schemes onmultiple flash devices and workloads, and show that our index-ing scheme outperforms existing schemes under all workloads andflash devices we consider.Categories and Subject Descriptors: H.2.4 [Database Manage-ment Systems]: Query processing H.3.1 [Content Analysis andIndexing]: Indexing methodsGeneral Terms: Algorithms, Design, Measurement, Performance.Keywords: B+-tree, NAND Flash, indexing, log-structured index,self-tuning index.1. INTRODUCTIONThis paper presents a database for sensor networks using flashbased storage. There are many use cases where it is desirable tostore data within the sensor network, rather than transmit it all toa central database. A first example are remote deployments wherean economical communication infrastructure is not available andan expensive low rate satellite or cellular connection is used, suchas in polar regions or remote seismic deployments. Another ex-ample is the large class of applications for which the entire rawdata is seldom required. A third example includes mobile sensornodes [10,23], with sporadic and short lived connections. A fourthPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.IPSN’07, April 25-27, 2007, Cambridge, Massachusetts, USA.Copyright 2007 ACM 978-1-59593-638-7/07/0004 ...$5.00.example includes sensor networks of mobile devices which havesignificant local processing power [4, 12]. In these cases ratherthan uploading the entire raw data stream, one may save energyand bandwidth by processing queries locally at a cluster-head or amore capable node and uploading only the query response or thecompressed or summary data. Storage centric networks have alsobeen discussed in [6,7].In most cases where the storage is part of the sensor network,the storage device used is flash based rather than a hard disk due toshock resistance, node size, and energy considerations. Addition-ally, flash is also common in many mobile devices such as PDA’s,cell-phones, music players, and personal exercise monitors. Thesedevices can benefit from a having light weight database.Our objective is to design storage and retrieval functionality forflash storage. A simple method is to archive data without an in-dex, and that is in fact efficient in many scenarios. However, aswe show in section 6, for scenarios where the number of queriesis more than a small fraction (≈ 1%) of the number of data items,having an index is useful. Hence, we focus on indexed storage.Prior work on flash storage provides file systems (e.g., ELF [5])and other useful data structures such as stacks, queues and limitedindexes (e.g., Capsule [14], MicroHash [22]). Our goal is to ex-tend the functionality provided by those methods to B+-tree basedindexing to support useful queries such as lookups, range-queries,multi-dimensional range-queries, and joins.Existing database products are not well suited for sensor net-works due to several reasons. Firstly, existing products, includingones that run on flash [15], are originally designed for hard disksand are not optimized for NAND flash characteristics. Secondly,existing products are not targeted to most sensor network work-loads. Sensor network storage workload may be highly write in-tensive: data may be added to the database more frequently thanit is read. This is different from many traditional database appli-cations where data is read more frequently than it is written. Alog-structured B-Tree indexing method was proposed in [21] to ad-dress these two problems.However, all existing indexing schemes, including those in com-mercial products and research prototypes, suffer from the drawbackthat they are not optimized for all available flash devices or realisticworkloads. They do not consider all factors in the design space andare suboptimal in many situations. For example, as we will showlater, the log-structured design [21] performs well with a write-intensive workload on an on-board flash chip, but performs poorlywhen run with a read-write workload or with a compact flash card.Similarly, disk-based designs are suitable for certain types of flashand workloads, but are suboptimal for others. This limits the useof existing schemes in practical system design, especially when thesystem is being designed for multiple types of workloads and flex-ibility in flash devices.We address this in FlashDB with a self-tuning indexing methodthat can adapt itself to the dynamic behavior of multiple device andworkload parameters that affect performance. Our current proto-type can run on more capable sensor nodes such as iMotes [11],ENSBox [8], XYZ [13], NIMS [17], or CarTel [10]; however, itssimplicity and small memory footprint (6kB for a database of 30,000records, independent of record size) promises its use in more re-source constrained devices (such as motes).We make the following contributions in this paper:• We discuss the design space of factors that


View Full Document

HARVARD CS 263 - Dynamic Self-tuning Database for NAND Flash

Download Dynamic Self-tuning Database for NAND Flash
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dynamic Self-tuning Database for NAND Flash and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dynamic Self-tuning Database for NAND Flash 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?