UW-Madison CS 764 - The Gamma Database Machine Project - D570987

Home> Schools> University of Wisconsin, Madison> (CS) > CS 764> The Gamma Database Machine Project

UW-Madison CS 764 - The Gamma Database Machine Project

School name University of Wisconsin, Madison

Course Cs 764- Topics in Database Management Systems

Pages 38

Download Save

Unformatted text preview:

The Gamma Database Machine ProjectDavid J. DeWittShahram GhandeharizadehDonovan SchneiderAllan BrickerHui-I HsiaoRick RasmussenComputer Sciences DepartmentUniversity of WisconsinThis research was partially supported by the Defense Advanced Research Projects Agency under contract N00039-86-C-0578, by the National Science Foundation under grant DCR-8512862, by a DARPA/NASA sponsored Gradu-ate Research Assistantship in Parallel Processing, and by research grants from Intel Scientific Computers, TandemComputers, and Digital Equipment Corporation.ABSTRACTThis paper describes the design of the Gamma database machine and the techniques employed in its imple-mentation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 pro-cessors and 32 disk drives. Gamma employs three key technical ideas which enable the architecture to be scaled to100s of processors. First, all relations are horizontally partitioned across multiple disk drives enabling relations tobe scanned in parallel. Second, novel parallel algorithms based on hashing are used to implement the complex rela-tional operators such as join and aggregate functions. Third, dataflow scheduling techniques are used to coordinatemultioperator queries. By using these techniques it is possible to control the execution of very complex queries withminimal coordination - a necessity for configurations involving a very large number of processors.In addition to describing the design of the Gamma software, a thorough performance evaluation of the iPSC/2hypercube version of Gamma is also presented. In addition to measuring the effect of relation size and indices onthe response time for selection, join, aggregation, and update queries, we also analyze the performance of Gammarelative to the number of processors employed when the sizes of the input relations are kept constant (speedup) andwhen the sizes of the input relations are increased proportionally to the number of processors (scaleup). Thespeedup results obtained for both selection and join queries are linear; thus, doubling the number of processorshalves the response time for a query. The scaleup results obtained are also quite encouraging. They reveal that anearly constant response time can be maintained for both selection and join queries as the workload is increased byadding a proportional number of processors and disks.11. IntroductionFor the last 5 years, the Gamma database machine project has focused on issues associated with the designand implementation of highly parallel database machines. In a number of ways, the design of Gamma is based onwhat we learned from our earlier database machine DIRECT [DEWI79]. While DIRECT demonstrated that paral-lelism could be successfully applied to processing database operations, it had a number of serious designdeficiencies that made scaling of the architecture to 100s of processors impossible; primarily the use of sharedmemory and centralized control for the execution of its parallel algorithms [BITT83].As a solution to the problems encountered with DIRECT, Gamma employs what appear today to be relativelystraightforward solutions. Architecturally, Gamma is based on a shared-nothing [STON86] architecture consistingof a number of processors interconnected by a communications network such as a hypercube or a ring, with disksdirectly connected to the individual processors. It is generally accepted that such architectures can be scaled toincorporate 1000s of processors. In fact, Teradata database machines [TERA85] incorporating a shared-nothingarchitecture with over 200 processors are already in use. The second key idea employed by Gamma is the use ofhash-based parallel algorithms. Unlike the algorithms employed by DIRECT, these algorithms require no central-ized control and can thus, like the hardware architecture, be scaled almost indefinitely. Finally, to make the best ofthe limited I/O bandwidth provided by the current generation of disk drives, Gamma employs the concept of hor-izontal partitioning [RIES78] (also termed declustering [LIVN87]) to distribute the tuples of a relation amongmultiple disk drives. This design enables large relations to be processed by multiple processors concurrentlywithout incurring any communications overhead.After the design of the Gamma software was completed in the fall of 1984, work began on the first prototypewhich was operational by the fall of 1985. This version of Gamma was implemented on top of an existing multi-computer consisting of 20 VAX 11/750 processors [DEWI84b]. In the period of 1986-1988, the prototype wasenhanced through the addition of a number of new operators (e.g. aggregate and update operators), new parallel joinmethods (Hybrid, Grace, and Sort-Merge [SCHN89a]), and a complete concurrency control mechanism. In addi-tion, we also conducted a number of performance studies of the system during this period [DEWI86, DEWI88,GHAN89, GHAN90]. In the spring of 1989, Gamma was ported to a 32 processor Intel iPSC/2 hypercube and theVAX-based prototype was retired.Gamma is similar to a number of other active parallel database machine efforts. In addition to Teradata[TERA85], Bubba [COPE88] and Tandem [TAND88] also utilize a shared-nothing architecture and employ theconcept of horizontal partitioning. While Teradata and Tandem also rely on hashing to decentralize the execution oftheir parallel algorithms, both systems tend to rely on relatively conventional join algorithms such as sort-merge for2processing the fragments of the relation at each site. Gamma, XPRS [STON88], and Volcano [GRAE89] each util-ize parallel versions of the Hybrid join algorithm [DEWI84a].The remainder of this paper is organized as follows. In Section 2 we describe the hardware used by each ofthe Gamma prototypes and our experiences with each. Section 3 discusses the organization of the Gamma softwareand describes how multioperator queries are controlled. The parallel algorithms employed by Gamma are describedin Section 4 and the techniques we employ for transaction and failure management are contained in Section 5. Sec-tion 6 contains a performance study of the 32 processor Intel hypercube prototype. Our conclusions and futureresearch directions are described in Section 7.2. Hardware Architecture of Gamma2.1. OverviewGamma is based on the concept of a shared-nothing architecture [STON86] in which processors do not sharedisk drives or random access memory and can only communicate with one

View Full Document


School:
Email:
New Password:
Confirm Password:

UW-Madison CS 764 - The Gamma Database Machine Project

Sign up for free to view:

Please select your school