CMSC424: Database DesignQuery OptimizationSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8A Simple CaseSlide 10Slide 11Slide 12Slide 13Slide 14Slide 15Equivalence of ExpressionsEquivalence RulesPictorial DepictionExampleSlide 20Slide 21Slide 22Evaluation PlansSlide 24Optimization AlgorithmsSearching for the best planSlide 27Left Deep Join TreesHeuristic OptimizationSteps in Typical Heuristic OptimizationSlide 31Cost EstimationStatistical Information for Cost EstimationHistogramsSelection Size EstimationSize Estimation of Complex SelectionsJoin Operation: Running ExampleEstimation of the Size of JoinsSlide 39Estimation of the Size of Joins (Cont.)Size Estimation for Other OperationsSize Estimation (Cont.)Estimation of Number of Distinct ValuesEstimation of Distinct Values (Cont.)Slide 45Slide 46Slide 47CMSC424: Database DesignInstructor: Amol Deshpande [email protected] OptimizationIntroductionExample of a Simple Type of QueryTransformation of Relational ExpressionsStatistics EstimationOptimization AlgorithmsQuery OptimizationWhy ? Many different ways of executing a given queryHuge differences in costExample:select * from person where ssn = “123”Size of person = 1GBSequential Scan:Takes 1GB / (20MB/s) = 50sUse an index on SSN (assuming one exists):Approx 4 Random I/Os = 40msQuery OptimizationEquivalent relational expressionsDrawn as a treeList the operations and the orderQuery OptimizationExecution plansEvaluation expressions annotated with the methods usedQuery OptimizationSteps:Generate all possible execution plans for the queryFigure out the cost for each of themChoose the bestNot done exactly as listed aboveToo many different execution plans for thatTypically interleave all of these into a single efficient search algorithmQuery OptimizationSteps:Generate all possible execution plans for the queryFirst generate all equivalent expressionsThen consider all annotations for the operationsFigure out the cost for each of themCompute cost for each operation Using the formulas discussed beforeOne problem: How do we know the number of result tuples for, say, Add them !Choose the best)(2500accountbalanceQuery OptimizationIntroductionExample of a Simple Type of QueryTransformation of Relational ExpressionsStatistics EstimationOptimization AlgorithmsA Simple CaseQueries with only selections on a single relation with no indexesselect * from person where substr(name, 1, 1) in [‘A’, ‘B’, ‘C’] and 100ns zipcode = 94720 and 1ns date-of-birth > to_date('1978/05/31', 'yyyy/mm/dd') 1000nsRelation contains: 1,000,000 tuplesCPU CostsA Simple CasePossible execution plan:For each tupleEvaluate substr predicateIf true, Evaluate zipcode predicateIf true, evaluate date-of-birth predicateIf true, output the tuple 6 different possibilitiesHow to choose one ?A Simple CaseCompute cost of each possibilitySay, substr() zipcode date-of-birthNeed some more informationselectivity: fraction of tuples expected to pass the predicatesLet selectivity(substr predicate) = 3/26Let selectivity(zipcode predicate) = 1/100And, selectivity(date-of-birth predicate) = 1/3How are selectivities computed ?Must keep track of some additional information about the relationsA Simple CaseCompute cost of each possibilitySay, substr() zipcode date-of-birthGiven that: Cost of the above plan =1,000,000 * 100ns+ 1,000,000 * 3/26 * 1ns+ 1,000,000 * 1/26 * 1/100 * 1000ns= approx 100.5 msCost of the plan: zipcode substr() date-of-birth:Approx 12.92 msAbout a factor of 10 better.A Simple CaseCompute cost of each possibilitySay, substr() zipcode date-of-birthCost of the plan: zipcode substr() date-of-birth:Approx 12.92 msAbout a factor of 10 better.General algorithm:Don’t need to check all n! PossibilitiesSort the predicates in the increasing order by: 1 – selectivity(predicate) cost of the predicateQuery OptimizationGeneral case:Need:A way to enumerate all plansA way to find the cost of each planSub problem: Estimating the selectivities of various operationsA way to search through the plans efficientlyQuery OptimizationIntroductionExample of a Simple Type of QueryTransformation of Relational ExpressionsStatistics EstimationOptimization AlgorithmsEquivalence of ExpressionsTwo relational expressions equivalent iff:Their result is identical on all legal databasesEquivalence rules:Allow replacing one expression with anotherExamples: 1. 2. Selections are commutative))(()(2121EE))(())((1221EEEquivalence RulesExamples: 3. 5. E1 E2 = E2 E1 7(a). If 0 only involves attributes from E1 0E1 E2) = (0(E1)) E2 And so on…Many rules of this type)())))((((121EELLnLL Pictorial DepictionExampleFind the names of all customers with an account at a Brooklyn branch whose account balance is over $1000. customer_name(branch_city = “Brooklyn” balance > 1000 (branch (account depositor)))Apply the rules one by one customer_name((branch_city = “Brooklyn” balance > 1000 (branch account)) depositor) customer_name(((branch_city = “Brooklyn” (branch)) ( balance > 1000 (account))) depositor)ExampleEquivalence of ExpressionsThe rules give us a way to enumerate all equivalent expressionsNote that the expressions don’t contain physical access methods, join methods etc…Simple Algorithm:Start with the original expressionApply all possible applicable rules to get a new set of expressionsRepeat with this new set of expressionsTill no new expressions are generatedEquivalence of ExpressionsWorks, but is not feasibleConsider a simple case:R1 (R2 (R3 (… Rn)))….)Just join commutativity and associativity will give us:At least:n^2 * 2^nAt worst:n! * 2^nTypically the process of enumeration is combined with the search processEvaluation PlansWe still need to choose the join methods etc..Option 1: Choose for each operation separatelyUsually okay, but sometimes the operators
View Full Document