UW-Madison CS 764 - THE SEQUOIA 2000 STORAGE BENCHMARK

Unformatted text preview:

THE SEQUOIA 2000 STORAGE BENCHMARKMichael Stonebraker, Jim Frew, Kenn Gardels and Jeff MeredithElectrical Engineering and Computer Science DepartmentUniversity of Cal#ornia, BerkeleyAbstractThis paper presents a benchmark that concisely captures thedata base requirements of a collection of Earth Scientists workingin the SEQUOIA 2000 project on various aspects of globalchange research. This benchmark has the novel characteristicthat it uses real data sets and real queries that are representativeof Earth Science tasks. Because it appears that Earth Scienceproblems are typical of the problems of engineering end scientificDBMS users, we claim that this benchmark represents the needsof this more general community. Also included in the paper arebenchmark results for three example DBMSS: GRASS, IPW andPOSTGRES.1. INTRODUCTIONThere have been numerous benchmarks oriented towardDBMS performance in a variety of application areas. Perhaps themost famous one, TP1 [ANON85] is oriented toward businessdata processing, and has spawned a collection of derivativebenchmarks, the most recent being TPC-A, TPC-B and TPC-C.These benchmarks represent the typical needs of a transactionprocessing user of a DBMS. They consist of short update-oriented transactions that will stress the transaction system andthe basic overhead of simple command processing. Anotherbenchmark [CA’IT92] is oriented toward electronic computeraided design (ECAD) applications. It contains a set of morecomplex commands that have high locality of reference on a tiny(main memory) data se~ and it stresses the efficiency of a client-server DBMS connection in a very specialized environment (i.e.security can be ignored). An extensive collection of otherDBMS-onentcd benchmarks is contained in [GRAY91].We feel that there is a broad application ere~ namelyengineering and scientific data bases, that has special needs notThis research was sponsoredby Digitsl Equipment Corporationunder Research Grant 1243, DARPA Contract #DABT63-92-C-(K)7 JWFGrant #RI-9 1-07455, ARO Grant#DAAL03-91 -6-0183Permission to copy without fee all or part of this matarial isgranted provided that the copies are not made or distributed fordirect commercial advantage, the ACM copyright notice and thetitle of tha publication and its date appear, and notice is giventhat copying is by permission of the Association for ComputingMachinery. To copy otherwise, or to republish, requires a feeand/or specific permission.SIGMOD /5/93 iWashington, DC, USA01993 ACM 0-89791 -592-51931000510002 ... $1 .50addressed by any of the above benchmarks. This community istypified by Earth Scientists, whose DBMS needs we are trying tosupport in the SEQUOIA 2000 research project [STON92].Earth Scientists are usually geographers, hydrologists, oceanogra-ptiers, or chemists by background, end are united by commonproblems concerning our survivability on Earth. They investigateissues surrounding global warming, ozone depletion, environ-ment toxification, species extinctio~ etc.Loosely speaking, Earth Science research can be divided intothree categories:field studiesremote sensingsimulationResearchers who perform field studies usually obtain geographicdat~ typically in data sets of the fornx{ (longitude, latitude, elevation, array-of-vahtes) )For example, one SEQUOIA 2000 group at the Santa BarbaraCampus has collected extensive field data from the AntarcticOcean about the effect of ozone depletion on ocean organisms[SMIT91]. Such data consist of various ocean characteristics atvarious depths for speciiic geographic locations.Researchers in remote sensing focus on analyzing and inter-preting satellite imagery. Such imagery can be thought of as afour dimensional array of values of the formvalue (longitude, latihrde, wavelength band, time)For example, the Thematic Mapper (TM) sensors on the Landsatsatellites sample the Earth’s surface on a 30 meter by 30 metergrid, in 7 wavelength bsnds, repeating every 15 dsys. For moreinformation on the requirements of remote sensing users, theinterested reader is directed to [LOHM83].Climate modelers use general circulation models (GCMS) forsimulating regional or global phenomena. Such models are simi-lar to computational fluid dynamics (CFD) models in that theytile the study area and then compute a collection of state vari-ables for each tile at time T+l basedon the state in the tile at timeT and that of neighboring tiles at time T. The output of suchsimulation models is an array of values of output parameters such2as temperature and barometric pressure as a function of timearray-of-values (longitude, latitude, elevation, time)Because GC!MS are so computationally intensive, Ed Scientistswish to save all GCM simulation output for extensive periods oftime, Subsequent analysis and visualization efforts cart use thesestored data, rather than requiring a model rerun.The characteristics of the Earth Science (ES) applications wehave been discussing are:1) massive sizeES data bases usually include substantial numbers of images andsimulation outpu~ and are extremely large. For example, the fourmain SEQUOIA 2000 ES research groups collectively would liketo store about 10 ** 14 bytes (100 Tbytes) of data. Or considerthe NASA Earth Observation System (EOS), a collection of satel-lites to be launched in the late 1990’s to support the needs of theES community. Collectively, these satellites will send 1 Tbyte ofdata per day back to ground stations. The ground storage anddistribution system (EOS/DIS), currently being built by a govern-ment contractor, is charged with storing all EOS data for 15years, When completed, this data base will be some 10 ** 16bytes (10 petabytes), and will be the Earth’s largest database.We see that database size in the ES community is often muchlarger than the modest size of TP1 benchmark data bases, TheCattell benchmark is even more modest in its size requirements.2) complex data typesES data bases often include multi-dimensional arrays, geometriesfor spatial objects, and other complex data types. How well anyDBMS performs in this environment is largely determined by itssupport for arrays, spatial objects and complex objects. Suchtypes are rarely present in other benchmarks, which limbs theirrelevance to the ES community.3) sophisticated searchingES data base applications include the requirement of searchingarrays and spatial data for desired information. B-trees are rarelyadequate for the search needs of this community. On the otherhand, most


View Full Document
Download THE SEQUOIA 2000 STORAGE BENCHMARK
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view THE SEQUOIA 2000 STORAGE BENCHMARK and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view THE SEQUOIA 2000 STORAGE BENCHMARK 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?