Berkeley COMPSCI 186 - Query Optimization and Intro to Transactions

Unformatted text preview:

Query Optimization and Intro to TransactionsAdministriviaReviewExamplePass 1: Best access method for each relationSlide 6Pass 2Slide 8Slide 9Slide 10Slide 11Pass 3 and beyondNested Queries (Subqueries)Nested QueriesSlide 15Taking it one step further…Slide 17SummaryTransaction Management OverviewComponents of a DBMSConcurrency Control & RecoveryStatement of ProblemStatement of problem (cont.)DefinitionsCorrectness criteria: The ACID propertiesAtomicity of TransactionsMechanisms for Ensuring AtomicityTransaction ConsistencyTransaction Consistency (cont.)Isolation of TransactionsSlide 31Example (Contd.)Formal Properties of SchedulesQuery Optimizationand Intro to Transactions R&G, Chapter 15Chapter 16Lecture 18Administrivia•Homework 3 due next Tuesday, March 20 by end of class period•Homework 4 available on class website–Implement nested loops and hash join operators for (new!) minibase–Due date: April 10 (after Spring Break)•Midterm 2 is 3/22, 1 week from today–In class, covers lectures 10-17–Review will be held Tuesday 3/20 7-9 pm 306 Soda Hall•Internships at Google this summer…–See http://www.postgresql.org/developer/summerofcode–Booth at the UCB TechExpo this Thursday:–http://csba.berkeley.edu/tech_expo.html–Contact [email protected] Optimizationand ExecutionRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDBWe are here•Query plans are a tree of operators that compute the result of a query•Optimization is the process of picking the best plan•Execution is the process of executing the planExampleSailors:Tuples 50 bytes long, 80 tuples/page, 500 pages Unclustered B+ and Hash on sid (key)40,000 sid values10 ratingsReserves:Tuples 40 bytes long, 100 tuples/page, 1000 pagesClustered B+ tree on bid (key), Unclustered B+ on sid100 distinct bid valuesBoatsTuples 50 bytes long, 80 tuples/page, 100 pagesUnclustered B+, Clustered Hash on color, 50 distinct color valuesSelect S.sid, COUNT(*) AS numberFROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” GROUP BY S.sidReservesSailorssid=sidBoats Sid, COUNT(*) AS numbersGROUPBY sidbid=bidColor=redPass 1: Best access method for each relation•Sailors–No predicates, so File Scan is best•Reserves–No predicates so File Scan is best–What about Index Scan with B+ index on bid for Reserves?–Keep it in mind…tuples will come out in join order…might come in handy later for a join on bid…Sailors:Tuples 50 bytes long, 80 tuples/page, 500 pages Unclustered B+ and Hash on sid (key)40,000 sid values10 ratingsReservesSailorssid=sidBoats Sid, COUNT(*) AS numbersGROUPBY sidbid=bidColor=redReserves:Tuples 40 bytes long, 100 tuples/page, 1000 pagesClustered B+ tree on bid (key), Unclustered B+ on sid100 distinct bid valuesPass 1: Best access method for each relation•Boats–File Scan100 I/Os–B+ Index Scan on color2-3 + 80*100/50 =3 + 160 I/Os = 163 I/Os–Hash Index Scan on color1.2 + (80*100/50)/80 =1.2 + 2 I/OsBoatsTuples 50 bytes long, 80 tuples/page, 100 pagesUnclustered B+, Clustered Hash on color, 50 distinct color valuesReservesSailorssid=sidBoats Sid, COUNT(*) AS numberGROUPBY sidbid=bidColor=redCheapestAlways keep around just in caseBook says to keep it because of interesting orderPass 2•For each of the plans in pass 1:–generate left deep plans for joins- consider different order and join methodsReservesBoats bid=bidColor=redSailorssid=sidReservesReservessid=sidSailorsColor=redReservesBoats bid=bid•Question: what about SailorsXBoats?XXPass 2ReservesBoats bid=bidColor=red•First consider which pass 1 plans to use for access pathReservesBoats bid=bidColor=red1. Hash index(color) Boats, File scan Reserves2. B+(color), File Scan Reserves•File scan Reserves, File Scan BoatsBoats: B+ tree(color), Hash(color), File ScanSailors: File ScanReserves: File ScanPass 2•Note: book also includes these plans:–File Scan Sailors (outer) with Boats (inner)–Boats hash on color with Sailors (inner)–Boats Btree on color with Sailors (inner)•Would you agree these should be considered?•File Scan Sailors, File Scan Reserves•B+(sid), File Scan Reserves•File scan Reserves, File Scan SailorsSailorssid=sidReservesReservessid=sidSailors•First consider which pass 1 plans to use for access pathBoats: B+ tree(bid), Hash(bid), File ScanSailors: File ScanReserves: File ScanPass 2ReservesBoats bid=bidColor=red•Now consider join methods and consider all access paths for innerReservesBoats bid=bidColor=red1. Hash index(color) Boats, File scan Reserves2. B+(color), File Scan Reserves•File scan Reserves, File Scan BoatsBoats: B+ tree(bid), Hash(bid), File ScanSailors: File ScanReserves: File ScanReserves:Clustered B+ tree on bid (key), Unclustered B+ on sidBoatsUnclustered B+, Clustered Hash on color, 50 distinct color valuesIf you replace file scan of Reserves with B+ index scan, Index Nested Loops is a good choice!Sort-Merge could also be a good choice becauseTuples come out in bid-order.Pass 2•File Scan Sailors, File Scan Reserves•B+(color), File Scan Reserves•File scan Reserves, File Scan SailorsSailorssid=sidReservesReservessid=sidSailors•Now consider join methodsBoats: B+ tree(bid), Hash(bid), File ScanSailors: File ScanReserves: File ScanExercise: Which plans would you keep for this set of joins?Pass 3 and beyond•For each of the plans retained from Pass 2, taken as the outer, generate plans for the next join–For example, let’s take the sort-merge plan for BoatsxReserves and add in Sailors:Sailorssid=sidB+ index(sid) ReservesSort-mergeReservesBoats bid=bidColor=redHash index(color) BoatsB+ index(bid) ReservesSort-mergeFrom pass 2Note that this plan will produce tuples in sid order; interesting order for the upcoming GROUP BY.•GROUP BY, ORDER BY, AGGREGATES are all considered after the join plans are chosen.Nested Queries (Subqueries)SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid)SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT * FROM Reserves R WHERE R.day = ’01/31/07’)•Nested queries are parsed into their own ‘query block’ and optimized separately.•Two kinds: UncorrelatedCorrelated•An uncorrelated subquery can be computed once per query.-> Optimizer can choose a plan for the subquery and add temp operator to ‘cache’ the subquery results for use in the rest of the querySubquery planTEMPPlan for outer


View Full Document

Berkeley COMPSCI 186 - Query Optimization and Intro to Transactions

Documents in this Course
Load more
Download Query Optimization and Intro to Transactions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Query Optimization and Intro to Transactions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Query Optimization and Intro to Transactions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?