DOC PREVIEW
UW CSE 444 - Lecture Notes

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 27: OptimizationsOutlineDynamic ProgrammingEstimating SizesSlide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11HistogramsSlide 13Slide 14Slide 15Completing the Physical Query PlanExample 7.38Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Lecture 27: OptimizationsWednesday, November 29, 2000Outline•Finish dynamic programming (7.6)•Estimating the Size of Operations (7.4)•Histograms (7.5.1)•Completing the physical-query-plan selection (7.7)Dynamic Programming•Recall: computes optimal plans for subqueries:–Step 1: {R1}, {R2}, …, {Rn}–Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn}–…–Step n: {R1, …, Rn}•We used naïve size/cost estimations•In practice:–more realistic size/cost estimations (next)–heuristics for Reducing the Search Space •Restrict to left linear trees•Restrict to trees “without cartesian product”–need more than just one plan for each subquery:•“interesting orders”Estimating Sizes•Need size in order to estimate cost•Example:–Cost of partitioned hash-join E1 E2 is 3B(E1) + 3B(E2)–B(E1) = T(E1)/ block size–B(E2) = T(E2)/ block size–So, we need to estimate T(E1), T(E2)Estimating SizesEstimating the size of a projection•Easy: T(L(R)) = T(R)•This is because a projection doesn’t eliminate duplicatesEstimating SizesEstimating the size of a selection•S = A=c(R)–T(S) san be anything from 0 to T(R) – V(R,A) + 1–Mean value: T(S) = T(R)/V(R,A)•S = A<c(R)–T(S) can be anything from 0 to T(R)–Heuristics: T(S) = T(R)/3Estimating SizesEstimating the size of a natural join, R S•When the set of A values are disjoint, then T(R S) = 0•When A is a key in S and a foreign key in R, then T(R S) = T(R)•When A has a unique value, the same in R and S, then T(R S) = T(R) T(S)AAAAEstimating SizesAssumptions:•Containment of values: if V(R,A) <= V(S,A), then the set of A values of R is included in the set of A values of S–Note: this indeed holds when A is a foreign key in R, and a key in S•Preservation of values: for any other attribute B, V(R S, B) = V(R, B) (or V(S, B))AEstimating SizesAssume V(R,A) <= V(S,A)•Then each tuple t in R joins some tuple(s) in S–How many ?–On average S/V(S,A)–t will contribute S/V(S,A) tuples in R S•Hence T(R S) = T(R) T(S) / V(S,A)In general: T(R S) = T(R) T(S) / max(V(R,A),V(S,A))AAAEstimating SizesExample:•T(R) = 10000, T(S) = 20000•V(R,A) = 100, V(S,A) = 200•How large is R S ?Answer: T(R S) = 10000 20000/200 = 1MAAEstimating SizesJoins on more than one attribute:•T(R S) = T(R) T(S)/max(V(R,A),V(S,A))max(V(R,B),V(S,B))A,BHistograms•Statistics on data maintained by the RDBMS•Makes size estimation much more accurate (hence, cost estimations are more accurate)HistogramsEmployee(ssn, name, salary, phone)•Maintain a histogram on salary:•T(Employee) = 25000, but now we know the distributionSalary: 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100kTuples 200 800 5000 12000 6500 500HistogramsRanks(rankName, salary)•Estimate the size of Employee RanksEmployee 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k200 800 5000 12000 6500 500Ranks 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k8 20 40 80 100 2SalaryHistograms•Assume:–V(Employee, Salary) = 200–V(Ranks, Salary) = 250•Then T(Employee Ranks) == i=1,6 Ti Ti’ / 250= (200x8 + 800x20 + 5000x40 + 12000x80 + 6500x100 + 500x2)/250= ….SalaryCompleting the Physical Query Plan•Choose algorithm to implement each operator–Need to account for more than cost:•How much memory do we have ?•Are the input operand(s) sorted ?•Decide for each intermediate result:–To materialize–To pipelineExample 7.38•Logical plan is:•Main memory M = 101 buffersR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Naïve evaluation: •2 partitioned hash-joins•Cost 3B(R) + 3B(S) + 4k + 3B(U) = 75000 + 4kR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Smarter:•Step 1: hash R on x into 100 buckets, each of 50 blocks; to disk•Step 2: hash S on x into 100 buckets; to disk•Step 3: read each Ri in memory (50 buffer) join with Si (1 buffer); hash result on y into 50 buckets (50 buffers) -- here we pipeline•Cost so far: 3B(R) + 3B(S)R(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Continuing:•How large are the 50 buckets on y ? Answer: k/50.•If k <= 50 then keep all 50 buckets in Step 3 in memory, then:•Step 4: read U from disk, hash on y and join with memory•Total cost: 3B(R) + 3B(S) + B(U) = 55,000R(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Continuing:•If 50 < k <= 5000 then send the 50 buckets in Step 3 to disk–Each bucket has size k/50 <= 100•Step 4: partition U into 50 buckets•Step 5: read each partition and join in memory•Total cost: 3B(R) + 3B(S) + 2k + 3B(U) = 75,000 + 2kR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Continuing:•If k > 5000 then materialize instead of pipeline•2 partitioned hash-joins•Cost 3B(R) + 3B(S) + 4k + 3B(U) = 75000 + 4kR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Summary:•If k <= 50, cost = 55,000•If 50 < k <=5000, cost = 75,000 + 2k•If k > 5000, cost = 75,000 +


View Full Document

UW CSE 444 - Lecture Notes

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?