Parallel Data ManagementIntroduction to Database Systems1Parallel DBMS!"#$%&'()'*+%',%""%-&.%#/0'1230'4#.5'&+6%'67.%-#7"'8-+6*#6'9-7)0':#;-+&+8.'<%&%7-;5=''!%%'7"&+>Module 9, Lecture 1Introduction to Database Systems2Why Parallel Access To Data?1 Terabyte10 MB/s At 10 MB/s1.2 days to scan 1 Terabyte1,000 x parallel1.5 minute to scan.Parallelism: divide a big problem into many smaller ones to be solved in parallel.BandwidthIntroduction to Database Systems3Parallel DBMS: Intro!Parallelism is natural to DBMS processing–Pipeline parallelism: many machines each doing onestep in a multi-step process.–Partition parallelism: many machines doing thesame thing to different pieces of data.–Both are natural in DBMS!PipelinePartitionAny Sequential ProgramAny Sequential ProgramSequentialSequentialSequentialSequentialAny Sequential ProgramAny Sequential Programoutputs split N ways, inputs merge M waysIntroduction to Database Systems4DBMS: The | | Success Story!DBMSs are the most (only?) successfulapplication of parallelism.–Teradata, Tandem vs. Thinking Machines, KSR..–Every major DBMS vendor has some | | server–Workstation manufacturers now depend on | | DBserver sales.!Reasons for success:–Bulk-processing (= partition | | -ism).–Natural pipelining.–Inexpensive hardware can do the trick!–Users/app-programmers don’t need to think in | |Introduction to Database Systems5Some | | Terminology!Speed-Up–More resources meansproportionally less timefor given amount of data.!Scale-Up–If resources increased inproportion to increase indata size, time is constant.degree of | | -ismXact/sec.(throughput)Idealdegree of | | -ismsec./Xact(response time)IdealIntroduction to Database Systems6Architecture Issue: Shared What?Shared Memory (SMP)Shared DiskShared Nothing (network)CLIENTSCLIENTSCLIENTSMemoryProcessorsEasy to programExpensive to buildDifficult to scaleupHard to programCheap to buildEasy to scaleupSequent, SGI, SunVMScluster, SysplexTandem, Teradata, SP2Introduction to Database Systems8Different Types of DBMS | | -ism!Intra-operator parallelism–get all machines working to compute a givenoperation (scan, sort, join)!Inter-operator parallelism–each operator may run concurrently on a differentsite (exploits pipelining)!Inter-query parallelism–different queries run on different sites!We’ll focus on intra-operator | | -ismIntroduction to Database Systems9Automatic Data PartitioningPartitioning a table:Range Hash Round RobinShared disk and memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning A...EF...JK...N O...S T...ZA...EF...J K...N O...S T...ZA...E F...JK...NO...ST...ZGood for equijoins, range queriesgroup-byGood for equijoinsGood to spread loadIntroduction to Database Systems10Parallel Scans!Scan in parallel, and merge.!Selection may not require all sites for range orhash partitioning.!Indexes can be built at each partition.!Question: How do indexes differ in thedifferent schemes?–Think about both lookups and inserts!–What about unique indexes?Introduction to Database Systems11Parallel Sorting!Current records:–8.5 Gb/minute, shared-nothing; Datamationbenchmark in 2.41 secs (UCB students!http://now.cs.berkeley.edu/NowSort/)!Idea:–Scan in parallel, and range-partition as you go.–As tuples come in, begin “local” sorting on each–Resulting data is sorted, and range-partitioned.–Problem: skew!–Solution: “sample” the data at start to determinepartition points.Introduction to Database Systems12Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems SurveyParallel AggregatesA...E F...J K...N O...S T...ZA TableCount Count Count Count CountCount!For each aggregate function, need a decomposition:–count(S) = ! count(s(i)), ditto for sum()–avg(S) = (! sum(s(i))) / ! count(s(i))–and so on...!For groups:–Sub-aggregate groups close to the source.–Pass each sub-aggregate to its group’s site."Chosen via a hash fn.EXAMP L EIntroduction to Database Systems13Parallel Joins!Nested loop:–Each outer tuple must be compared with eachinner tuple that might join.–Easy for range partitioning on join cols, hardotherwise!!Sort-Merge (or plain Merge-Join):–Sorting gives range-partitioning."But what about handling 2 skews?–Merging partitioned tables is local.Introduction to Database Systems14Parallel Hash Join!In first phase, partitions get distributed todifferent sites:–A good hash function automatically distributeswork evenly!!Do second phase at each site.!Almost always the winner for equi-join.Original Relations(R then S)OUTPUT2B main memory buffersDiskDiskINPUT1hashfunctionhB-1Partitions12B-1. . .Phase 1Introduction to Database Systems15Dataflow Network for | | Join!Good use of split/merge makes it easier tobuild parallel versions of sequential join code.Introduction to Database Systems16Complex Parallel Query Plans!Complex Queries: Inter-Operator parallelism–Pipelining between operators:"note that sort and phase 1 of hash-join block thepipeline!!–Bushy TreesABRSSites 1-4Sites 5-8Sites 1-8Introduction to Database Systems18Observations!It is relatively easy to build a fast parallelquery executor–S.M.O.P.!It is hard to write a robust and world-classparallel query optimizer.–There are many tricks.–One quickly hits the complexity bar rier.–Still open research!Introduction to Database Systems19Parallel Query Optimization!Common approach: 2 phases–Pick best sequential plan (System R algorithm)–Pick degree of parallelism based on currentsystem parameters.!“Bind” operators to processors–Take query tree, “decorate” as in previous picture.Introduction to Database Systems20!Best serial plan != Best | | plan! Why?!Trivial counter-example:–Table partitioned with local secondary index attwo nodes–Range query: all of node 1 and 1% of node 2.–Node 1 should do a scan of its partition.–Node 2 should use secondary index.!SELECT * FROM telephone_book WHERE name < “NoGood”;What’s Wrong With That?N..ZTableScanA..MIndex ScanGoogle Approach toSystems EngineeringProf. Christof Fetzer, Ph.D.Heinz-Nixdorf Endowed Chair forSystems EngineeringTU DresdenProf. Christof Fetzer, TU DresdenLocalize: Network Architecture© NasaProf. Christof Fetzer, TU DresdenApproach [2]! Goal:" Hide the complexity of parallelism, datadistribution and fault-tolerance! Approach: MapReduce" Simplify programming by hiding these issues in alibrary" The programmer focuses on the problem at
View Full Document