Query Optimization: TransformationsSchema for Some ExamplesMotivating ExampleAlternative Plans 1Alternative Plans 2 With IndexesBuilding Blocks for OptimizationQuery Optimization Process (simplified a bit)Key Lessons in OptimizationOperations (revisited)Algebraic LawsSlide 11Slide 12Slide 13Query Rewrites: Sub-queriesThe Un-Nested QueryConverting Nested QueriesSlide 17Slide 18Slide 19Rewrites: Group By and JoinRewrites:Operation IntroductionQuery Rewriting: Predicate PushdownQuery Rewrites: Predicate Pushdown (through grouping)Query Rewrite: Predicate MovearoundSlide 25Slide 26Query Rewrite SummaryQuery Optimization: TransformationsMay 29th, 2002Schema for Some Examples•Reserves:–Each tuple is 40 bytes long, 100 tuples per page, 1000 pages (4000 tuples)•Sailors:–Each tuple is 50 bytes long, 80 tuples per page, 500 pages (4000 tuples). Sailors (sid: integer, sname: string, rating: integer, age: real)Reserves (sid: integer, bid: integer, day: dates, rname: string)Motivating Example•Cost: 500+500*1000 I/Os•By no means the worst plan! •Misses several opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc.•Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5ReservesSailorssid=sidbid=100 rating > 5snameReservesSailorssid=sidbid=100 rating > 5sname(Simple Nested Loops)(On-the-fly)(On-the-fly)RA Tree:Plan:Alternative Plans 1 •Main difference: push selects.•With 5 buffers, cost of plan:–Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution).–Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).–Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250), total=1800 –Total: 3560 page I/Os.•If we used BNL join, join cost = 10+4*250, total cost = 2770.•If we `push’ projections, T1 has only sid, T2 only sid and sname:–T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Scan;write to temp T1)(Scan;write totemp T2)(Sort-Merge Join)Alternative Plans 2With Indexes•With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages.•INL with pipelining (outer is not materialized). Decision not to push rating>5 before the join is based on availability of sid index on Sailors. Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. Join column sid is a key for Sailors.–At most one matching tuple, unclustered index on sid OK.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Use hashindex; donot writeresult to temp)(Index Nested Loops,with pipelining )(On-the-fly)Building Blocks for Optimization•Algebraic transformations (many and wacky).•Statistical model: estimating costs and sizes.•Finding the best join trees:–Bottom-up (dynamic programming): System-R•Newer architectures:–Starburst: rewrite and then tree find–Volcano: all at once, top-down.Query Optimization Process(simplified a bit)•Parse the SQL query into a logical tree:–identify distinct blocks (corresponding to nested sub-queries or views). •Query rewrite phase:–apply algebraic transformations to yield a cheaper plan.–Merge blocks and move predicates between blocks. •Optimize each block: join ordering.•Complete the optimization: select scheduling (pipelining strategy).Key Lessons in Optimization•There are many approaches and many details to consider in query optimization–Classic search/optimization problem!–Not completely solved yet!•Main points to take away are:–Algebraic rules and their use in transformations of queries.–Deciding on join ordering: System-R style (Selinger style) optimization.–Estimating cost of plans and sizes of intermediate results.Operations (revisited)•Scan ([index], table, predicate):–Either index scan or table scan.–Try to push down sargable predicates.•Selection (filter)•Projection (always need to go to the data?)•Joins: nested loop (indexed), sort-merge, hash, outer join.•Grouping and aggregation (usually the last).Algebraic Laws•Commutative and Associative Laws–R U S = S U R, R U (S U T) = (R U S) U T–R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T–R S = S R, R (S T) = (R S) T•Distributive Laws–R (S U T) = (R S) U (R T)Algebraic Laws•Laws involving selection:– C AND C’(R) = C( C’(R)) = C(R) ∩ C’(R)– C OR C’(R) = C(R) U C’(R)– C (R S) = C (R) S •When C involves only attributes of R– C (R – S) = C (R) – S– C (R U S) = C (R) U C (S)– C (R ∩ S) = C (R) ∩ SAlgebraic Laws•Example: R(A, B, C, D), S(E, F, G)– F=3 (R S) = ?– A=5 AND G=9 (R S) = ?D=ED=EAlgebraic Laws•Laws involving projections– M(R S) = N(P(R) Q(S))•Where N, P, Q are appropriate subsets of attributes of M– M(N(R)) = M,N(R)•Example R(A,B,C,D), S(E, F, G)– A,B,G(R S) = ? (?(R) ?(S)) D=ED=EQuery Rewrites: Sub-queriesSELECT Emp.NameFROM EmpWHERE Emp.Age < 30 AND Emp.Dept# IN (SELECT Dept.Dept# FROM Dept WHERE Dept.Loc = “Seattle” AND Emp.Emp#=Dept.Mgr)The Un-Nested QuerySELECT Emp.NameFROM Emp, DeptWHERE Emp.Age < 30 AND Emp.Dept#=Dept.Dept# AND Dept.Loc = “Seattle” AND Emp.Emp#=Dept.MgrConverting Nested QueriesSelect distinct x.name, x.makerFrom product xWhere x.color= “blue” AND x.price >= ALL (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”)Select distinct x.name, x.makerFrom product xWhere x.color= “blue” AND x.price >= ALL (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”)How do we convert this one to logical plan ?Converting Nested QueriesSelect distinct x.name, x.makerFrom product xWhere x.color= “blue” AND x.price < SOME (Select y.price
View Full Document