UMD CMSC 424 - Query Optimization - D779746

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 424> Query Optimization

DOC PREVIEW

UMD CMSC 424 - Query Optimization

School name University of Maryland, College Park

Course Cmsc 424- Database Design

Pages 47

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CMSC424: Database DesignQuery OptimizationSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8A Simple CaseSlide 10Slide 11Slide 12Slide 13Slide 14Slide 15Equivalence of ExpressionsEquivalence RulesPictorial DepictionExampleSlide 20Slide 21Slide 22Evaluation PlansSlide 24Optimization AlgorithmsSearching for the best planSlide 27Left Deep Join TreesHeuristic OptimizationSteps in Typical Heuristic OptimizationSlide 31Cost EstimationStatistical Information for Cost EstimationHistogramsSelection Size EstimationSize Estimation of Complex SelectionsJoin Operation: Running ExampleEstimation of the Size of JoinsSlide 39Estimation of the Size of Joins (Cont.)Size Estimation for Other OperationsSize Estimation (Cont.)Estimation of Number of Distinct ValuesEstimation of Distinct Values (Cont.)Slide 45Slide 46Slide 47CMSC424: Database DesignInstructor: Amol Deshpande [email protected] OptimizationIntroductionExample of a Simple Type of QueryTransformation of Relational ExpressionsStatistics EstimationOptimization AlgorithmsQuery OptimizationWhy ? Many different ways of executing a given queryHuge differences in costExample:select * from person where ssn = “123”Size of person = 1GBSequential Scan:Takes 1GB / (20MB/s) = 50sUse an index on SSN (assuming one exists):Approx 4 Random I/Os = 40msQuery OptimizationEquivalent relational expressionsDrawn as a treeList the operations and the orderQuery OptimizationExecution plansEvaluation expressions annotated with the methods usedQuery OptimizationSteps:Generate all possible execution plans for the queryFigure out the cost for each of themChoose the bestNot done exactly as listed aboveToo many different execution plans for thatTypically interleave all of these into a single efficient search algorithmQuery OptimizationSteps:Generate all possible execution plans for the queryFirst generate all equivalent expressionsThen consider all annotations for the operationsFigure out the cost for each of themCompute cost for each operation Using the formulas discussed beforeOne problem: How do we know the number of result tuples for, say, Add them !Choose the best)(2500accountbalanceQuery OptimizationIntroductionExample of a Simple Type of QueryTransformation of Relational ExpressionsStatistics EstimationOptimization AlgorithmsA Simple CaseQueries with only selections on a single relation with no indexesselect * from person where substr(name, 1, 1) in [‘A’, ‘B’, ‘C’] and 100ns zipcode = 94720 and 1ns date-of-birth > to_date('1978/05/31', 'yyyy/mm/dd') 1000nsRelation contains: 1,000,000 tuplesCPU CostsA Simple CasePossible execution plan:For each tupleEvaluate substr predicateIf true, Evaluate zipcode predicateIf true, evaluate date-of-birth predicateIf true, output the tuple 6 different possibilitiesHow to choose one ?A Simple CaseCompute cost of each possibilitySay, substr()  zipcode  date-of-birthNeed some more informationselectivity: fraction of tuples expected to pass the predicatesLet selectivity(substr predicate) = 3/26Let selectivity(zipcode predicate) = 1/100And, selectivity(date-of-birth predicate) = 1/3How are selectivities computed ?Must keep track of some additional information about the relationsA Simple CaseCompute cost of each possibilitySay, substr()  zipcode  date-of-birthGiven that: Cost of the above plan =1,000,000 * 100ns+ 1,000,000 * 3/26 * 1ns+ 1,000,000 * 1/26 * 1/100 * 1000ns= approx 100.5 msCost of the plan: zipcode  substr()  date-of-birth:Approx 12.92 msAbout a factor of 10 better.A Simple CaseCompute cost of each possibilitySay, substr()  zipcode  date-of-birthCost of the plan: zipcode  substr()  date-of-birth:Approx 12.92 msAbout a factor of 10 better.General algorithm:Don’t need to check all n! PossibilitiesSort the predicates in the increasing order by: 1 – selectivity(predicate) cost of the predicateQuery OptimizationGeneral case:Need:A way to enumerate all plansA way to find the cost of each planSub problem: Estimating the selectivities of various operationsA way to search through the plans efficientlyQuery OptimizationIntroductionExample of a Simple Type of QueryTransformation of Relational ExpressionsStatistics EstimationOptimization AlgorithmsEquivalence of ExpressionsTwo relational expressions equivalent iff:Their result is identical on all legal databasesEquivalence rules:Allow replacing one expression with anotherExamples: 1. 2. Selections are commutative))(()(2121EE))(())((1221EEEquivalence RulesExamples: 3. 5. E1  E2 = E2  E1 7(a). If 0 only involves attributes from E1 0E1  E2) = (0(E1))  E2 And so on…Many rules of this type)())))((((121EELLnLL Pictorial DepictionExampleFind the names of all customers with an account at a Brooklyn branch whose account balance is over $1000. customer_name(branch_city = “Brooklyn”  balance > 1000 (branch (account depositor)))Apply the rules one by one customer_name((branch_city = “Brooklyn”  balance > 1000 (branch account)) depositor) customer_name(((branch_city = “Brooklyn” (branch)) ( balance > 1000 (account))) depositor)ExampleEquivalence of ExpressionsThe rules give us a way to enumerate all equivalent expressionsNote that the expressions don’t contain physical access methods, join methods etc…Simple Algorithm:Start with the original expressionApply all possible applicable rules to get a new set of expressionsRepeat with this new set of expressionsTill no new expressions are generatedEquivalence of ExpressionsWorks, but is not feasibleConsider a simple case:R1 (R2 (R3 (… Rn)))….)Just join commutativity and associativity will give us:At least:n^2 * 2^nAt worst:n! * 2^nTypically the process of enumeration is combined with the search processEvaluation PlansWe still need to choose the join methods etc..Option 1: Choose for each operation separatelyUsually okay, but sometimes the operators

View Full Document