Query Optimization and Intro to TransactionsAdministriviaReviewExamplePass 1: Best access method for each relationSlide 6Pass 2Slide 8Slide 9Slide 10Slide 11Pass 3 and beyondNested Queries (Subqueries)Nested QueriesSlide 15Taking it one step further…Slide 17SummaryTransaction Management OverviewComponents of a DBMSConcurrency Control & RecoveryStatement of ProblemStatement of problem (cont.)DefinitionsCorrectness criteria: The ACID propertiesAtomicity of TransactionsMechanisms for Ensuring AtomicityTransaction ConsistencyTransaction Consistency (cont.)Isolation of TransactionsSlide 31Example (Contd.)Formal Properties of SchedulesQuery Optimizationand Intro to Transactions R&G, Chapter 15Chapter 16Lecture 18Administrivia•Homework 3 due next Tuesday, March 20 by end of class period•Homework 4 available on class website–Implement nested loops and hash join operators for (new!) minibase–Due date: April 10 (after Spring Break)•Midterm 2 is 3/22, 1 week from today–In class, covers lectures 10-17–Review will be held Tuesday 3/20 7-9 pm 306 Soda Hall•Internships at Google this summer…–See http://www.postgresql.org/developer/summerofcode–Booth at the UCB TechExpo this Thursday:–http://csba.berkeley.edu/tech_expo.html–Contact [email protected] Optimizationand ExecutionRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDBWe are here•Query plans are a tree of operators that compute the result of a query•Optimization is the process of picking the best plan•Execution is the process of executing the planExampleSailors:Tuples 50 bytes long, 80 tuples/page, 500 pages Unclustered B+ and Hash on sid (key)40,000 sid values10 ratingsReserves:Tuples 40 bytes long, 100 tuples/page, 1000 pagesClustered B+ tree on bid (key), Unclustered B+ on sid100 distinct bid valuesBoatsTuples 50 bytes long, 80 tuples/page, 100 pagesUnclustered B+, Clustered Hash on color, 50 distinct color valuesSelect S.sid, COUNT(*) AS numberFROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” GROUP BY S.sidReservesSailorssid=sidBoats Sid, COUNT(*) AS numbersGROUPBY sidbid=bidColor=redPass 1: Best access method for each relation•Sailors–No predicates, so File Scan is best•Reserves–No predicates so File Scan is best–What about Index Scan with B+ index on bid for Reserves?–Keep it in mind…tuples will come out in join order…might come in handy later for a join on bid…Sailors:Tuples 50 bytes long, 80 tuples/page, 500 pages Unclustered B+ and Hash on sid (key)40,000 sid values10 ratingsReservesSailorssid=sidBoats Sid, COUNT(*) AS numbersGROUPBY sidbid=bidColor=redReserves:Tuples 40 bytes long, 100 tuples/page, 1000 pagesClustered B+ tree on bid (key), Unclustered B+ on sid100 distinct bid valuesPass 1: Best access method for each relation•Boats–File Scan100 I/Os–B+ Index Scan on color2-3 + 80*100/50 =3 + 160 I/Os = 163 I/Os–Hash Index Scan on color1.2 + (80*100/50)/80 =1.2 + 2 I/OsBoatsTuples 50 bytes long, 80 tuples/page, 100 pagesUnclustered B+, Clustered Hash on color, 50 distinct color valuesReservesSailorssid=sidBoats Sid, COUNT(*) AS numberGROUPBY sidbid=bidColor=redCheapestAlways keep around just in caseBook says to keep it because of interesting orderPass 2•For each of the plans in pass 1:–generate left deep plans for joins- consider different order and join methodsReservesBoats bid=bidColor=redSailorssid=sidReservesReservessid=sidSailorsColor=redReservesBoats bid=bid•Question: what about SailorsXBoats?XXPass 2ReservesBoats bid=bidColor=red•First consider which pass 1 plans to use for access pathReservesBoats bid=bidColor=red1. Hash index(color) Boats, File scan Reserves2. B+(color), File Scan Reserves•File scan Reserves, File Scan BoatsBoats: B+ tree(color), Hash(color), File ScanSailors: File ScanReserves: File ScanPass 2•Note: book also includes these plans:–File Scan Sailors (outer) with Boats (inner)–Boats hash on color with Sailors (inner)–Boats Btree on color with Sailors (inner)•Would you agree these should be considered?•File Scan Sailors, File Scan Reserves•B+(sid), File Scan Reserves•File scan Reserves, File Scan SailorsSailorssid=sidReservesReservessid=sidSailors•First consider which pass 1 plans to use for access pathBoats: B+ tree(bid), Hash(bid), File ScanSailors: File ScanReserves: File ScanPass 2ReservesBoats bid=bidColor=red•Now consider join methods and consider all access paths for innerReservesBoats bid=bidColor=red1. Hash index(color) Boats, File scan Reserves2. B+(color), File Scan Reserves•File scan Reserves, File Scan BoatsBoats: B+ tree(bid), Hash(bid), File ScanSailors: File ScanReserves: File ScanReserves:Clustered B+ tree on bid (key), Unclustered B+ on sidBoatsUnclustered B+, Clustered Hash on color, 50 distinct color valuesIf you replace file scan of Reserves with B+ index scan, Index Nested Loops is a good choice!Sort-Merge could also be a good choice becauseTuples come out in bid-order.Pass 2•File Scan Sailors, File Scan Reserves•B+(color), File Scan Reserves•File scan Reserves, File Scan SailorsSailorssid=sidReservesReservessid=sidSailors•Now consider join methodsBoats: B+ tree(bid), Hash(bid), File ScanSailors: File ScanReserves: File ScanExercise: Which plans would you keep for this set of joins?Pass 3 and beyond•For each of the plans retained from Pass 2, taken as the outer, generate plans for the next join–For example, let’s take the sort-merge plan for BoatsxReserves and add in Sailors:Sailorssid=sidB+ index(sid) ReservesSort-mergeReservesBoats bid=bidColor=redHash index(color) BoatsB+ index(bid) ReservesSort-mergeFrom pass 2Note that this plan will produce tuples in sid order; interesting order for the upcoming GROUP BY.•GROUP BY, ORDER BY, AGGREGATES are all considered after the join plans are chosen.Nested Queries (Subqueries)SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid)SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT * FROM Reserves R WHERE R.day = ’01/31/07’)•Nested queries are parsed into their own ‘query block’ and optimized separately.•Two kinds: UncorrelatedCorrelated•An uncorrelated subquery can be computed once per query.-> Optimizer can choose a plan for the subquery and add temp operator to ‘cache’ the subquery results for use in the rest of the querySubquery planTEMPPlan for outer
View Full Document