Indexing and ExecutionHow are we doing?Double Pipelined Join (Tukwila)Query ExecutionQuery OptimizationHow are we going to build one?DiscussionSchema for Some ExamplesMotivating ExampleAlternative Plans 1Alternative Plans 2 With IndexesBuilding BlocksQuery Optimization Process (simplified a bit)Key Lessons in OptimizationOperations (revisited)Algebraic LawsSlide 17Slide 18Slide 19Query Rewrites: Sub-queriesThe Un-Nested QueryConverting Nested QueriesSlide 23Slide 24Slide 25Semi-Joins, Magic SetsRewrites: Magic SetsRewrites: SIPsSupporting ViewsAnd Finally…Rewrites: Group By and JoinRewrites:Operation IntroductionQuery Rewriting: Predicate PushdownQuery Rewrites: Predicate Pushdown (through grouping)Query Rewrite: Predicate MovearoundSlide 36Slide 37Query Rewrite SummaryCost EstimationStatistics and CatalogsCost Model for Our AnalysisSimple Nested Loops JoinIndex Nested Loops JoinBlock Nested Loops JoinSort-Merge Join (R S)Cost of Sort-Merge JoinHash-JoinCost of Hash-JoinSize Estimation and Reduction FactorsHistogramsHistogramsSlide 52Slide 53Slide 54Plans for Single-Relation Queries (Prep for Join ordering)ExampleDetermining Join OrderingTypes of Join TreesSlide 59Slide 60ProblemDynamic ProgrammingSlide 63Slide 64Slide 65Slide 66Slide 67Indexing and ExecutionMay 22nd, 2002How are we doing?Nested loopjoin140 hours 50 millionI/OsPage nestedloop1.4 hours 500,000 I/OsIndex nestedloop36 minutes 220,000 I/OsSort-mergejoin75 seconds 7500 I/O’sHash join 45 seconds 4500 I/O’sDouble Pipelined Join (Tukwila)Hash JoinPartially pipelined: no output until inner readAsymmetric (inner vs. outer) — optimization requires source behavior knowledgeDouble Pipelined Hash JoinOutputs data immediatelySymmetric — requires less source knowledge to optimizeQuery ExecutionQuery compilerExecution engineIndex/record mgr.Buffer managerStorage managerstorageUser/ApplicationQueryupdateQuery executionplanRecord, indexrequestsPage commandsRead/writepagesQuery OptimizationImperative query execution plan:Declarative SQL queryIdeally: Want to find best plan. Practically: Avoid worst plans!Goal:PurchasePersonBuyer=nameCity=‘seattle’ phone>’5430000’buyer(Simple Nested Loops)(Table scan) (Index scan)SELECT S.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Inputs:• the query• statistics about the data (indexes, cardinalities, selectivity factors)• available memoryHow are we going to build one?•What kind of optimizations can we do?•What are the issues?•How would we architect a query optimizer?DiscussionHow Would You Do It?Schema for Some Examples•Reserves:–Each tuple is 40 bytes long, 100 tuples per page, 1000 pages (4000 tuples)•Sailors:–Each tuple is 50 bytes long, 80 tuples per page, 500 pages (4000 tuples). Sailors (sid: integer, sname: string, rating: integer, age: real)Reserves (sid: integer, bid: integer, day: dates, rname: string)Motivating Example•Cost: 500+500*1000 I/Os•By no means the worst plan! •Misses several opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc.•Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5ReservesSailorssid=sidbid=100 rating > 5snameReservesSailorssid=sidbid=100 rating > 5sname(Simple Nested Loops)(On-the-fly)(On-the-fly)RA Tree:Plan:Alternative Plans 1 •Main difference: push selects.•With 5 buffers, cost of plan:–Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution).–Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).–Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250), total=1800 –Total: 3560 page I/Os.•If we used BNL join, join cost = 10+4*250, total cost = 2770.•If we `push’ projections, T1 has only sid, T2 only sid and sname:–T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Scan;write to temp T1)(Scan;write totemp T2)(Sort-Merge Join)Alternative Plans 2With Indexes•With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages.•INL with pipelining (outer is not materialized). Decision not to push rating>5 before the join is based on availability of sid index on Sailors. Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. Join column sid is a key for Sailors.–At most one matching tuple, unclustered index on sid OK.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Use hashindex; donot writeresult to temp)(Index Nested Loops,with pipelining )(On-the-fly)Building Blocks•Algebraic transformations (many and wacky).•Statistical model: estimating costs and sizes.•Finding the best join trees:–Bottom-up (dynamic programming): System-R•Newer architectures:–Starburst: rewrite and then tree find–Volcano: all at once, top-down.Query Optimization Process(simplified a bit)•Parse the SQL query into a logical tree:–identify distinct blocks (corresponding to nested sub-queries or views). •Query rewrite phase:–apply algebraic transformations to yield a cheaper plan.–Merge blocks and move predicates between blocks. •Optimize each block: join ordering.•Complete the optimization: select scheduling (pipelining strategy).Key Lessons in Optimization•There are many approaches and many details to consider in query optimization–Classic search/optimization problem!–Not completely solved yet!•Main points to take away are:–Algebraic rules and their use in transformations of queries.–Deciding on join ordering: System-R style (Selinger style) optimization.–Estimating cost of plans and sizes of intermediate results.Operations (revisited)•Scan ([index], table, predicate):–Either index scan or table scan.–Try to push down sargable predicates.•Selection (filter)•Projection (always need to go to the data?)•Joins: nested loop (indexed), sort-merge, hash, outer join.•Grouping and aggregation (usually the last).Algebraic Laws•Commutative and Associative Laws–R U S = S U R, R U (S U T) = (R U S) U T–R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T–R S = S R, R (S T) = (R S) T•Distributive Laws–R (S U T) = (R S) U (R
View Full Document