UW CSE 444 - Lecture Notes - D2026139

Home> Schools> University of Washington> Computer Science and Engineering (CSE) > CSE 444> Lecture Notes

DOC PREVIEW

UW CSE 444 - Lecture Notes

School name University of Washington

Course Cse 444- Database Internals

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture 27: OptimizationsOutlineDynamic ProgrammingEstimating SizesSlide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11HistogramsSlide 13Slide 14Slide 15Completing the Physical Query PlanExample 7.38Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Lecture 27: OptimizationsWednesday, November 29, 2000Outline•Finish dynamic programming (7.6)•Estimating the Size of Operations (7.4)•Histograms (7.5.1)•Completing the physical-query-plan selection (7.7)Dynamic Programming•Recall: computes optimal plans for subqueries:–Step 1: {R1}, {R2}, …, {Rn}–Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn}–…–Step n: {R1, …, Rn}•We used naïve size/cost estimations•In practice:–more realistic size/cost estimations (next)–heuristics for Reducing the Search Space •Restrict to left linear trees•Restrict to trees “without cartesian product”–need more than just one plan for each subquery:•“interesting orders”Estimating Sizes•Need size in order to estimate cost•Example:–Cost of partitioned hash-join E1 E2 is 3B(E1) + 3B(E2)–B(E1) = T(E1)/ block size–B(E2) = T(E2)/ block size–So, we need to estimate T(E1), T(E2)Estimating SizesEstimating the size of a projection•Easy: T(L(R)) = T(R)•This is because a projection doesn’t eliminate duplicatesEstimating SizesEstimating the size of a selection•S = A=c(R)–T(S) san be anything from 0 to T(R) – V(R,A) + 1–Mean value: T(S) = T(R)/V(R,A)•S = A<c(R)–T(S) can be anything from 0 to T(R)–Heuristics: T(S) = T(R)/3Estimating SizesEstimating the size of a natural join, R S•When the set of A values are disjoint, then T(R S) = 0•When A is a key in S and a foreign key in R, then T(R S) = T(R)•When A has a unique value, the same in R and S, then T(R S) = T(R) T(S)AAAAEstimating SizesAssumptions:•Containment of values: if V(R,A) <= V(S,A), then the set of A values of R is included in the set of A values of S–Note: this indeed holds when A is a foreign key in R, and a key in S•Preservation of values: for any other attribute B, V(R S, B) = V(R, B) (or V(S, B))AEstimating SizesAssume V(R,A) <= V(S,A)•Then each tuple t in R joins some tuple(s) in S–How many ?–On average S/V(S,A)–t will contribute S/V(S,A) tuples in R S•Hence T(R S) = T(R) T(S) / V(S,A)In general: T(R S) = T(R) T(S) / max(V(R,A),V(S,A))AAAEstimating SizesExample:•T(R) = 10000, T(S) = 20000•V(R,A) = 100, V(S,A) = 200•How large is R S ?Answer: T(R S) = 10000 20000/200 = 1MAAEstimating SizesJoins on more than one attribute:•T(R S) = T(R) T(S)/max(V(R,A),V(S,A))max(V(R,B),V(S,B))A,BHistograms•Statistics on data maintained by the RDBMS•Makes size estimation much more accurate (hence, cost estimations are more accurate)HistogramsEmployee(ssn, name, salary, phone)•Maintain a histogram on salary:•T(Employee) = 25000, but now we know the distributionSalary: 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100kTuples 200 800 5000 12000 6500 500HistogramsRanks(rankName, salary)•Estimate the size of Employee RanksEmployee 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k200 800 5000 12000 6500 500Ranks 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k8 20 40 80 100 2SalaryHistograms•Assume:–V(Employee, Salary) = 200–V(Ranks, Salary) = 250•Then T(Employee Ranks) == i=1,6 Ti Ti’ / 250= (200x8 + 800x20 + 5000x40 + 12000x80 + 6500x100 + 500x2)/250= ….SalaryCompleting the Physical Query Plan•Choose algorithm to implement each operator–Need to account for more than cost:•How much memory do we have ?•Are the input operand(s) sorted ?•Decide for each intermediate result:–To materialize–To pipelineExample 7.38•Logical plan is:•Main memory M = 101 buffersR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Naïve evaluation: •2 partitioned hash-joins•Cost 3B(R) + 3B(S) + 4k + 3B(U) = 75000 + 4kR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Smarter:•Step 1: hash R on x into 100 buckets, each of 50 blocks; to disk•Step 2: hash S on x into 100 buckets; to disk•Step 3: read each Ri in memory (50 buffer) join with Si (1 buffer); hash result on y into 50 buckets (50 buffers) -- here we pipeline•Cost so far: 3B(R) + 3B(S)R(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Continuing:•How large are the 50 buckets on y ? Answer: k/50.•If k <= 50 then keep all 50 buckets in Step 3 in memory, then:•Step 4: read U from disk, hash on y and join with memory•Total cost: 3B(R) + 3B(S) + B(U) = 55,000R(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Continuing:•If 50 < k <= 5000 then send the 50 buckets in Step 3 to disk–Each bucket has size k/50 <= 100•Step 4: partition U into 50 buckets•Step 5: read each partition and join in memory•Total cost: 3B(R) + 3B(S) + 2k + 3B(U) = 75,000 + 2kR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Continuing:•If k > 5000 then materialize instead of pipeline•2 partitioned hash-joins•Cost 3B(R) + 3B(S) + 4k + 3B(U) = 75000 + 4kR(w,x)50,00 blocksS(x,y)10,000 blocksU(y,z)10,000 blocksk blocksExample 7.38Summary:•If k <= 50, cost = 55,000•If 50 < k <=5000, cost = 75,000 + 2k•If k > 5000, cost = 75,000 +

View Full Document

UW CSE 444 - Lecture Notes

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-22-23 out of 23 pages.

UW CSE 444 - Lecture Notes

Sign up for free to view:

Please select your school