Unformatted text preview:

Overview of Query EvaluationAdministriviaReview: StorageReviewQueries today, more on sorting next timeReview: Relational AlgebraSlide 7Overview (cont)Slide 9Intermission: a preview of sortingTwo-Way External Merge SortSchema for ExamplesExample 1Statistics and CatalogsAccess Paths – Getting tuples from a TableA Note on Complex SelectionsOne Approach to SelectionsUsing an Index for SelectionsProjectionJoin: Index Nested LoopsExamples of Index Nested LoopsJoin: Sort-Merge (R S)Example of Sort-Merge JoinHighlights of System R OptimizerCost EstimationSize Estimation and Reduction FactorsMotivating ExampleAlternative Plans 1 (No Indexes)Alternative Plans 2 With IndexesSummaryOverview of Query Evaluation R&G Chapter 12Lecture 13Administrivia•Exams graded•HW2 due in a week•No Office Hours TodayReview: StorageQuery Optimizationand ExecutionRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDB•A DBMS has layersNow to Midterm 2Review•We studied Relational Algebra–Many equivalent queries, produce same result–Which expression is most efficient?•We studied file organizations–Hash files, Sorted files, Clustered & Unclustered Indexes–Compared scans, sorting, searches, insert, delete–Today: costs to implement relational operations–Thurs, Tues: Sorting, JoinsQueries today, more on sorting next time•Remember: SQL declarative language–It describes the query result, but not how to get it•Relational Algebra describes how to get results–But many rel. algebra queries equivalent–How to choose the right one for an SQL query?•In a nutshell:–When database executing query, it must generate a variety of possible plans (relational algebra queries), and find the cheapest one to execute.Review: Relational Algebra First, remember Relational Algebra•Selection (  ) Selects a subset of rows from relation (horizontal).•Projection (  ) Retains only wanted columns from relation (vertical).•Cross-product (  ) Allows us to combine two relations.•Set-difference ( — ) Tuples in r1, but not in r2.•Union (  ) Tuples in r1 and/or in r2.•Intersection ()•Join ( )•Division ( / ) •Overview of Query Evaluation•Plan: Tree of R.A. ops, with choice of alg for each op.•Two main issues in query optimization:–For a given query, what plans are considered?•Algorithm to search plan space for cheapest (estimated) plan.–How is the cost of a plan estimated?•Ideally: Want to find best plan. •Practically: Avoid worst plans!•We will study the System R approach.Overview (cont)•Query Evaluation involves:–Choosing an Access Path to get at each table–Evaluating different algorithms for each relational operator–Choosing the order to apply the relational operators–These choices interrelatedOverview (cont)•Overall goal: minimize I/Os•Algorithms for evaluating relational operators use simple ideas extensively:–Indexing: Can use WHERE conditions to retrieve small set of tuples (selections, joins)–Iteration: Sometimes, faster to scan all tuples even if there is an index. (sometimes scan the data entries in an index instead of the table itself.)–Partitioning: By using sorting or hashing, we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs.* Watch for these techniques as we discuss query evaluation!Intermission: a preview of sorting•Data can only be sorted when in memory•But tables often *much* bigger than memory•One solution: merge sort•Every one stand up•Go to the aisle by the windows•I will take 10 people at a time onto the stage•I will sort each group of 10 on last name from A to Z•Groups will then be mergedTwo-Way External Merge Sort•Each pass we read + write each page in file.•N pages in the file => the number of passes•So total cost is: •Idea: Divide and conquer: sort subfiles and merge  log21N  2 12N Nlog Input file1-page runs2-page runs4-page runs8-page runsPASS 0PASS 1PASS 2PASS 393,46,29,4 8,7 5,6 3,123,45,62,6 4,9 7,81,3 22,34,64,78,91,35,6 22,34,46,78,91,23,561,22,33,44,56,67,8Schema for Examples•Similar to old schema; rname added for variations.•Reserves:–Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.•Sailors:–Each tuple is 50 bytes long, 80 tuples per page, 500 pages. Sailors (sid: integer, sname: string, rating: integer, age: real)Reserves (sid: integer, bid: integer, day: dates, rname: string)Example 1•Select sname, bid from Sailors S, Reserves R where s.sid = r.sid and S.age > 99•Several possible rel. algebra queries:s.age>99)(S R)s.age>99)S) R•Second one may be much cheaper if right indexes exist.Statistics and Catalogs•Need information about relations and indexes involved. Catalogs typically contain at least:–# tuples (NTuples), # pages (NPages) for each relation.–# distinct key values (NKeys) and NPages for each index.–Index height, low/high key values (Low/High) for each tree index.•Catalogs updated periodically.–Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok.•More detailed information (e.g., histograms of the values in some field) are sometimes stored.Access Paths – Getting tuples from a Table•Access path: a method of retrieving tuples:File scan, or index that matches a selection (in the query) •Is an index useful for a query? If it matches predicate:•Tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key.E.g., Tree index on <a, b, c> matches the selection a=5 AND b=3, and a=5 AND b>6, but not b=3.Hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key.E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND c=5; but it does not match b=3, or a=5 AND b=3, or a>5 AND b=3 AND c=5.A Note on Complex Selections•Selection conditions are first converted to conjunctive normal form (CNF): (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname=‘Paul’ OR bid=5 OR sid=3) •We only discuss case with no ORs; see text if you are curious about the general case. (day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3One Approach to Selections•Find the most selective access path, •retrieve tuples using it, and •apply any remaining terms that don’t match the index:–Most selective access path: An index


View Full Document

Berkeley COMPSCI 186 - Overview of Query Evaluation

Documents in this Course
Load more
Download Overview of Query Evaluation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Overview of Query Evaluation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Overview of Query Evaluation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?