Chico CSCI 693 - Database Architecture Optimized for the new Bottleneck - D2589472

Home> Schools> California State University, Chico> Computer Science (CSCI) > CSCI 693> Database Architecture Optimized for the new Bottleneck

DOC PREVIEW

Chico CSCI 693 - Database Architecture Optimized for the new Bottleneck

School name California State University, Chico

Course Csci 693- Research Methods in Computer Science

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Database Architecture Optimized for the newBottleneck: Memory AccessPeter Boncz∗Stefan Manegold Martin L. KerstenData Distilleries B.V. CWIAmsterdam · The Netherlands Amsterdam · The [email protected] {S.Manegold,M.L.Kersten}@cwi.nlAbstractIn the past decade, advances in speed of com-modity CPUs have far out-paced advancesin memory latency. Main-memory access istherefore increasingly a performance bottle-neck for many computer applications, includ-ing database systems. In this article, we use asimple scan test to show the severe impact ofthis bottleneck. The insights gained are trans-lated into guidelines for database architecture;in terms of both data structures and algo-rithms. We discuss how vertically fragmenteddata structures optimize cache performanceon sequential data access. We then focuson equi-join, typically a random-access oper-ation, and introduce radix algorithms for par-titioned hash-join. The performance o f thesealgorithms is quantified using a detailed ana-lytical model that incorporates memory accesscost. Experiments that validate this modelwere per formed on the Monet database sys-tem. We obtained exact statistics on eventslike TLB misses, L1 and L2 cache misses, byusing hardware performance counters found inmodern CPUs. Using our cost model, we showhow the carefully tuned memory access pat-tern of our ra dix algor ithms make them per-form well, which is confirmed by experimentalresults.∗*This work was carried out when the author was at theUniversity of Amsterdam, supported by SION grant 612-23-431Permission to copy without fee all or part of this material isgranted provided that the copies are not made or distributed fordirect commercial advantage, the VLDB copyright notice andthe title of the publication and its date appear, and notice isgiven that copying is by permission of the Very Large Data BaseEndowment. To copy otherwise, or to republish, requires a feeand/or special permission from the Endowment.Proceedings of the 25th VLDB Conference,Edinburgh, Scotland, 1999.1 IntroductionCustom hardware – fro m workstations to PCs – hasbeen experiencing tremendous improvements in thepast decades. Unfortunately, this gr owth has notbeen equally distributed over all aspects of ha rdwareperformance and capacity. Figure 1 shows that thespeed of commercial microprocessors has been increas-ing roughly 70% every year, while the speed of com-modity DRAM has improved by little more than 50%over the past decade [Mow94]. Part of the reason forthis is that there is a direct tradeoff between capacityand sp eed in DRAM chips, and the highest priorityhas been for increasing capacity. The result is thatfrom the perspective of the processor, memory hasbeen getting slower at a dramatic r ate. This affectsall computer systems, making it increasingly difficultto achieve high processor efficiencies.Three aspects of memory performance are of inter-est: bandwidth, latency, and address translation. Theonly way to reduce effective memory latency for appli-79 81 83 85 87 89 91 93 95 971101001000 ProcessorsDRAMSpeed (Mhz)YearFigure 1: Hardware trends in DRAM and CPU sp eedcations has been to incorporate cache memories in thememory subsystem. Fast and more exp ensive SRAMmemory chips found their way to computer boards,to be used as L2 caches. Due to the ever-rising CPUclo ck-sp eeds, the time to bridge the physical distancebetween such chips and the CPU became a problem; somodern CPUs come with an on-chip L1 cache (see Fig-ure 2 ). This physical distance is actually a major com-plication for designs trying to reduce main-memorylatency. The new DRAM standards Rambus [Ram96]and SLDRAM [SLD97] therefore concentra te on fix-ing the memory bandwidth bottleneck [McC95], ratherthan the latency pro blem.Cache memories can reduce the memory latencyonly when the requested data is found in the cache.This mainly depends on the memory access pat-tern of the application. Thus, unless special care istaken, memory latency becomes an increasing perfor-mance bottleneck, preventing applications – includingdatabase systems – from fully exploiting the power ofmodern hardware.Besides memory latency and memory bandwidth,translation of logical virtual memory addresses tophysical page addresses can also have severe impact onmemory access performance. The Memory Manage-ment Unit (MMU) of a ll modern CPUs has a Trans-lation Looka s ide Buffer (TLB), a kind of cache thatholds the translation for (typically) the 64 most re-cently used pages. If a logical address is found in theTLB, the translation has no additional cost. Othe-wise, a TLB miss occurs. A TLB miss is handled bytrapping to a routine in the operating system kernel,that translates the address and places it in the TLB.Depending o n the implementation and hardware ar-chitecture, TLB misses can be more costly even thana main memory access.1.1 OverviewIn this article we investigate the effect of memory ac-cess cost on database performance, by looking in detailat the main-memory cost of typical database appli-cations. Our r esearch group has studied large main-memory database systems for the past 10 years. Thisresearch started in the PRISMA project [AvdBF+92],focusing on massive parallelism, and is now centeredaround Monet [BQK96, BWK98]; a high-performancesystem targeted to query-intensive application areaslike OLAP and Data Mining. For the research pre-sented here, we use Monet as o ur experimentationplatform.The rest of this paper is organized as follows: In Sec-tion 2, we analyze the impact of memory access costson basic database operations. We show that, unlessspecial care is taken, a database s erver r unning even asimple sequential scan on a table will sp end 95% of itscycles waiting for memory to be a ccessed. This mem-ory access bottleneck is even more difficult to avoid inL1 CacheMain MemoryBUS(on disk)swap fileVirtual MemoryMemory PageL2 CacheL2 cache-lineCPU DieL1 cache-lineregistersCPUFigure 2: Hierarchical Memory Systemmore complex database operations like sorting, aggre-gation and join, that exhibit a random access pattern.In Section 3, we discuss the consequences of thisbottleneck for data structures and alg orithms to beused in database systems. We identify vertical frag-mentation a s the solution for database data structuresthat leads to optimal memory ca che usage. Concern-ing query processing algorithms, we focus on equi-join,and introduce new radix-alg orithms for partitionedhash-join. We analyze the properties of these

View Full Document