UW CSE 444 - Indexing and Execution - D2189805

Home> Schools> University of Washington> Computer Science and Engineering (CSE) > CSE 444> Indexing and Execution

DOC PREVIEW

UW CSE 444 - Indexing and Execution

School name University of Washington

Course Cse 444- Database Internals

Pages 67

This preview shows page 1-2-3-4-31-32-33-34-35-64-65-66-67 out of 67 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Indexing and ExecutionHow are we doing?Double Pipelined Join (Tukwila)Query ExecutionQuery OptimizationHow are we going to build one?DiscussionSchema for Some ExamplesMotivating ExampleAlternative Plans 1Alternative Plans 2 With IndexesBuilding BlocksQuery Optimization Process (simplified a bit)Key Lessons in OptimizationOperations (revisited)Algebraic LawsSlide 17Slide 18Slide 19Query Rewrites: Sub-queriesThe Un-Nested QueryConverting Nested QueriesSlide 23Slide 24Slide 25Semi-Joins, Magic SetsRewrites: Magic SetsRewrites: SIPsSupporting ViewsAnd Finally…Rewrites: Group By and JoinRewrites:Operation IntroductionQuery Rewriting: Predicate PushdownQuery Rewrites: Predicate Pushdown (through grouping)Query Rewrite: Predicate MovearoundSlide 36Slide 37Query Rewrite SummaryCost EstimationStatistics and CatalogsCost Model for Our AnalysisSimple Nested Loops JoinIndex Nested Loops JoinBlock Nested Loops JoinSort-Merge Join (R S)Cost of Sort-Merge JoinHash-JoinCost of Hash-JoinSize Estimation and Reduction FactorsHistogramsHistogramsSlide 52Slide 53Slide 54Plans for Single-Relation Queries (Prep for Join ordering)ExampleDetermining Join OrderingTypes of Join TreesSlide 59Slide 60ProblemDynamic ProgrammingSlide 63Slide 64Slide 65Slide 66Slide 67Indexing and ExecutionMay 22nd, 2002How are we doing?Nested loopjoin140 hours 50 millionI/OsPage nestedloop1.4 hours 500,000 I/OsIndex nestedloop36 minutes 220,000 I/OsSort-mergejoin75 seconds 7500 I/O’sHash join 45 seconds 4500 I/O’sDouble Pipelined Join (Tukwila)Hash JoinPartially pipelined: no output until inner readAsymmetric (inner vs. outer) — optimization requires source behavior knowledgeDouble Pipelined Hash JoinOutputs data immediatelySymmetric — requires less source knowledge to optimizeQuery ExecutionQuery compilerExecution engineIndex/record mgr.Buffer managerStorage managerstorageUser/ApplicationQueryupdateQuery executionplanRecord, indexrequestsPage commandsRead/writepagesQuery OptimizationImperative query execution plan:Declarative SQL queryIdeally: Want to find best plan. Practically: Avoid worst plans!Goal:PurchasePersonBuyer=nameCity=‘seattle’ phone>’5430000’buyer(Simple Nested Loops)(Table scan) (Index scan)SELECT S.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Inputs:• the query• statistics about the data (indexes, cardinalities, selectivity factors)• available memoryHow are we going to build one?•What kind of optimizations can we do?•What are the issues?•How would we architect a query optimizer?DiscussionHow Would You Do It?Schema for Some Examples•Reserves:–Each tuple is 40 bytes long, 100 tuples per page, 1000 pages (4000 tuples)•Sailors:–Each tuple is 50 bytes long, 80 tuples per page, 500 pages (4000 tuples). Sailors (sid: integer, sname: string, rating: integer, age: real)Reserves (sid: integer, bid: integer, day: dates, rname: string)Motivating Example•Cost: 500+500*1000 I/Os•By no means the worst plan! •Misses several opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc.•Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5ReservesSailorssid=sidbid=100 rating > 5snameReservesSailorssid=sidbid=100 rating > 5sname(Simple Nested Loops)(On-the-fly)(On-the-fly)RA Tree:Plan:Alternative Plans 1 •Main difference: push selects.•With 5 buffers, cost of plan:–Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution).–Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).–Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250), total=1800 –Total: 3560 page I/Os.•If we used BNL join, join cost = 10+4*250, total cost = 2770.•If we `push’ projections, T1 has only sid, T2 only sid and sname:–T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Scan;write to temp T1)(Scan;write totemp T2)(Sort-Merge Join)Alternative Plans 2With Indexes•With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages.•INL with pipelining (outer is not materialized). Decision not to push rating>5 before the join is based on availability of sid index on Sailors. Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. Join column sid is a key for Sailors.–At most one matching tuple, unclustered index on sid OK.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Use hashindex; donot writeresult to temp)(Index Nested Loops,with pipelining )(On-the-fly)Building Blocks•Algebraic transformations (many and wacky).•Statistical model: estimating costs and sizes.•Finding the best join trees:–Bottom-up (dynamic programming): System-R•Newer architectures:–Starburst: rewrite and then tree find–Volcano: all at once, top-down.Query Optimization Process(simplified a bit)•Parse the SQL query into a logical tree:–identify distinct blocks (corresponding to nested sub-queries or views). •Query rewrite phase:–apply algebraic transformations to yield a cheaper plan.–Merge blocks and move predicates between blocks. •Optimize each block: join ordering.•Complete the optimization: select scheduling (pipelining strategy).Key Lessons in Optimization•There are many approaches and many details to consider in query optimization–Classic search/optimization problem!–Not completely solved yet!•Main points to take away are:–Algebraic rules and their use in transformations of queries.–Deciding on join ordering: System-R style (Selinger style) optimization.–Estimating cost of plans and sizes of intermediate results.Operations (revisited)•Scan ([index], table, predicate):–Either index scan or table scan.–Try to push down sargable predicates.•Selection (filter)•Projection (always need to go to the data?)•Joins: nested loop (indexed), sort-merge, hash, outer join.•Grouping and aggregation (usually the last).Algebraic Laws•Commutative and Associative Laws–R U S = S U R, R U (S U T) = (R U S) U T–R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T–R S = S R, R (S T) = (R S) T•Distributive Laws–R (S U T) = (R S) U (R

View Full Document