DOC PREVIEW
UW CSE 444 - Query Optimization

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Lecture 26:Query OptimizationMonday, December 4th, 20062Outline• Cost-based optimization 16.5, 16.6• Cost estimation: 16.43Cost-based Optimizations• Main idea: apply algebraic laws, until estimated cost is minimal• Practically: start from partial plans, introduce operators one by one– Will see in a few slides• Problem: there are too many ways to apply the laws, hence too many (partial) plans4Cost-based OptimizationsApproaches:• Top-down: the partial plan is a top fragment of the logical plan• Bottom up: the partial plan is a bottom fragment of the logical plan5Dynamic ProgrammingOriginally proposed in System R• Only handles single block queries:• Heuristics: selections down, projections up• Dynamic programming: join reorderingSELECT listFROM listWHERE cond1AND cond2AND . . . AND condkSELECT listFROM listWHERE cond1AND cond2AND . . . AND condk6Join Trees• R1 |×| R2 |×| …. |×|Rn• Join tree:• A plan = a join tree• A partial plan = a subtree of a join treeR3 R1 R2 R47Types of Join Trees• Left deep:R3R1R5R2R48Types of Join Trees• Bushy:R3R1R2 R4R59Types of Join Trees• Right deep:R3R1R5R2 R410Dynamic Programming• Given: a query R1 |×| R2 |×| … |×| Rn• Assume we have a function cost() that gives us the cost of every join tree• Find the best join tree for the query11Dynamic Programming• Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset• In increasing order of set cardinality:– Step 1: for {R1}, {R2}, …, {Rn}– Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn}–…– Step n: for {R1, …, Rn}• It is a bottom-up strategy• A subset of {R1, …, Rn} is also called a subquery12Dynamic Programming• For each subquery Q ⊆{R1, …, Rn} compute the following:–Size(Q)– A best plan for Q: Plan(Q)– The cost of that plan: Cost(Q)13Dynamic Programming• Step 1: For each {Ri} do:–Size({Ri}) = B(Ri)–Plan({Ri}) = Ri–Cost({Ri}) = (cost of scanning Ri)14Dynamic Programming• Step i: For each Q ⊆{R1, …, Rn} of cardinality i do:– Compute Size(Q) (later…)– For every pair of subqueries Q’, Q’’s.t. Q = Q’ ∪ Q’’compute cost(Plan(Q’) |×| Plan(Q’’))– Cost(Q) = the smallest such cost– Plan(Q) = the corresponding plan15Dynamic Programming• Return Plan({R1, …, Rn})16Dynamic ProgrammingTo illustrate, we will make the following simplifications:•Cost(P1|×| P2) = Cost(P1) + Cost(P2) +size(intermediate result(s))• Intermediate results:– If P1= a join, then the size of the intermediate result is size(P1), otherwise the size is 0– Similarly for P2• Cost of a scan = 017Dynamic Programming•Example:•Cost(R5 |×| R7) = 0 (no intermediate results)•Cost((R2 |×| R1) |×| R7) = Cost(R2 |×| R1) + Cost(R7) + size(R2 |×| R1)= size(R2 |×| R1)18Dynamic Programming• Relations: R, S, T, U• Number of tuples: 2000, 5000, 3000, 1000• Size estimation: T(A |×| B) = 0.01*T(A)*T(B)19RSTUSTURTURSURSTTUSUSTRURTRSPlanCostSizeSubquery20(RT)(SU)60k+50k=110k30MRSTU(TU)S30k1.5MSTU(RU)T20k0.6MRTU(RU)S20k1MRSU(RT)S60k3MRSTTU030kTUSU050kSUST0150kSTRU020kRURT060kRTRS0100kRSPlanCostSizeSubquery21Reducing the Search Space • Left-linear trees v.s. Bushy trees• Trees without cartesian productExample: R(A,B) |×| S(B,C) |×| T(C,D)Plan: (R(A,B) |×| T(C,D)) |×| S(B,C) has a cartesian product –most query optimizers will not consider it22Dynamic Programming: Summary• Handles only join queries:– Selections are pushed down (i.e. early)– Projections are pulled up (i.e. late)• Takes exponential time in general, BUT:– Left linear joins may reduce time– Non-cartesian products may reduce time further23Rule-Based Optimizers• Extensible collection of rulesRule = Algebraic law with a direction• Algorithm for firing these rulesGenerate many alternative plans, in some orderPrune by cost• Volcano (later SQL Sever)• Starburst (later DB2)24Completing the Physical Query Plan• Choose algorithm to implement each operator– Need to account for more than cost:• How much memory do we have ?• Are the input operand(s) sorted ?• Decide for each intermediate result:– To materialize– To pipeline25Materialize Intermediate Results Between Operators⋈⋈⋈TRSUHashTable Å Srepeat read(R, x)y Å join(HashTable, x)write(V1, y)HashTable Å Trepeat read(V1, y)z Å join(HashTable, y)write(V2, z)HashTable Å Urepeat read(V2, z)u Å join(HashTable, z)write(Answer, u)HashTable Å Srepeat read(R, x)y Å join(HashTable, x)write(V1, y)HashTable Å Trepeat read(V1, y)z Å join(HashTable, y)write(V2, z)HashTable Å Urepeat read(V2, z)u Å join(HashTable, z)write(Answer, u)V1V226Materialize Intermediate Results Between OperatorsQuestion in classGiven B(R), B(S), B(T), B(U)• What is the total cost of the plan ?–Cost = • How much main memory do we need ?–M =27Pipeline Between Operators⋈⋈⋈TRSUHashTable1 Å SHashTable2 Å THashTable3 Å Urepeat read(R, x)y Å join(HashTable1, x) z Å join(HashTable2, y)u Å join(HashTable3, z)write(Answer, u)HashTable1 Å SHashTable2 Å THashTable3 Å Urepeat read(R, x)y Å join(HashTable1, x) z Å join(HashTable2, y)u Å join(HashTable3, z)write(Answer, u)pipeline28Pipeline Between OperatorsQuestion in classGiven B(R), B(S), B(T), B(U)• What is the total cost of the plan ?–Cost = • How much main memory do we need ?–M =29Pipeline in Bushy Trees⋈⋈⋈XRS⋈⋈ZY⋈VT⋈I30Example• Logical plan is:• Main memory M = 101 buffersR(w,x)5,000 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocks31ExampleNaïve evaluation: • 2 partitioned hash-joins• Cost 3B(R) + 3B(S) + 4k + 3B(U) = 75000 + 4kR(w,x)5,000 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksM = 10132ExampleSmarter:• Step 1: hash R on x into 100 buckets, each of 50 blocks; to disk• Step 2: hash S on x into 100 buckets; to disk• Step 3: read each Riin memory (50 buffer) join with Si(1 buffer); hash result on y into 50 buckets (50 buffers) -- here we pipeline• Cost so far: 3B(R) + 3B(S)R(w,x)5,000 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksM = 10133ExampleContinuing:• How large are the 50 buckets on y ? Answer: k/50.• If k <= 50 then keep all 50 buckets in Step 3 in memory, then:• Step 4: read U from disk, hash on y and join with memory• Total cost: 3B(R) + 3B(S) + B(U) = 55,000R(w,x)5,000 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksM = 10134ExampleContinuing:• If 50 < k <= 5000 then send the 50 buckets in Step 3 to disk–


View Full Document

UW CSE 444 - Query Optimization

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Query Optimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Query Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Query Optimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?