UW CSE 444 - Lecture Notes - D2307167

Home> Schools> University of Washington> Computer Science and Engineering (CSE) > CSE 444> Lecture Notes

DOC PREVIEW

UW CSE 444 - Lecture Notes

School name University of Washington

Course Cse 444- Database Internals

Pages 50

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 50 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Introduction to Database Systems CSE 444 Lecture 20: Operator Algorithms Magda Balazinska - CSE 444, Fall 2010 1Why Learn About Op Algos? • Implemented in commercial DBMSs – DBMSs implement different subsets of known algorithms • Good algorithms can greatly improve performance • Need to know about physical operators to understand query optimization Magda Balazinska - CSE 444, Fall 2010 2Magda Balazinska - CSE 444, Fall 2010 Cost Parameters • In database systems the data is on disk • Cost = total number of I/Os • Parameters: – B(R) = # of blocks (i.e., pages) for relation R – T(R) = # of tuples in relation R – V(R, a) = # of distinct values of attribute a • When a is a key, V(R,a) = T(R) • When a is not a key, V(R,a) can be anything < T(R) 3Magda Balazinska - CSE 444, Fall 2010 Cost • Cost of an operation = number of disk I/Os to – Read the operands – Compute the result • Cost of writing the result to disk is not included – Need to count it separately when applicable 4Magda Balazinska - CSE 444, Fall 2010 Cost of Scanning a Table • Result may be unsorted: B(R) • Result needs to be sorted: 3B(R) – We will discuss sorting later 5Magda Balazinska - CSE 444, Fall 2010 Outline for Today • Join operator algorithms – One-pass algorithms (Sec. 15.2 and 15.3) – Index-based algorithms (Sec 15.6) – Two-pass algorithms (Sec 15.4 and 15.5) – Note about readings: • In class, we will discuss only algorithms for join operator (because other operators are easier) • Read the book to get more details about these algos • Read the book to learn about algos for other operators 6Magda Balazinska - CSE 444, Fall 2010 Basic Join Algorithms • Logical operator: – Product(pname, cname) ⋈ Company(cname, city) • Propose three physical operators for the join, assuming the tables are in main memory: – Hash join – Nested loop join – Sort-merge join 7Magda Balazinska - CSE 444, Fall 2010 Hash Join Hash join: R ⋈ S • Scan R, build buckets in main memory • Then scan S and join • Cost: B(R) + B(S) • One-pass algorithm when B(R) <= M – By “one pass”, we mean that the operator reads its operands only once. It does not write intermediate results back to disk. 8Hash Join Example 9 Patient Insurance Patient(pid, name, address) Insurance(pid, provider, policy_nb) 1 ‘Bob’ ‘Seattle’ 2 ‘Ela’ ‘Everett’ 3 ‘Jill’ ‘Kent’ 4 ‘Joe’ ‘Seattle’ Patient 2 ‘Blue’ 123 4 ‘Prem’ 432 Insurance 4 ‘Prem’ 343 3 ‘GrpH’ 554 Two tuples per pageHash Join Example 10 Patient Insurance 1 2 3 4 Patient 2 4 Insurance 4 3 Showing pid only 8 5 9 6 2 8 8 9 6 6 1 3 Disk Memory M = 21 pagesHash Join Example 11 Step 1: Scan Patient and create hash table in memory 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Memory M = 21 pages Hash h: pid % 5 Input buffer 1 2 4 3 9 6 8 5 1 2Hash Join Example 12 Step 2: Scan Insurance and probe into hash table 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Memory M = 21 pages Hash h: pid % 5 Input buffer 1 2 4 3 9 6 8 5 1 2 2 4 Output buffer 2 2 Write to diskHash Join Example 13 Step 2: Scan Insurance and probe into hash table 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Memory M = 21 pages Hash h: pid % 5 Input buffer 1 2 4 3 9 6 8 5 1 2 2 4 Output buffer 4 4Hash Join Example 14 Step 2: Scan Insurance and probe into hash table 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Memory M = 21 pages Hash h: pid % 5 Input buffer 1 2 4 3 9 6 8 5 1 2 4 3 Output buffer 4 4 Keep going until read all of Insurance Cost: B(R) + B(S)Hash Join Details 15 Open( ) { H = newHashTable( ); S.Open( ); x = S.GetNext( ); while (x != null) { H.insert(x); x = S.GetNext( ); } S.Close( ); R.Open( ); buffer = [ ]; }Hash Join Details 16 GetNext( ) { while (buffer == [ ]) { x = R.GetNext( ); if (x==Null) return NULL; buffer = H.find(x); } z = buffer.first( ); buffer = buffer.rest( ); return z; }Hash Join Details 17 Close( ) { release memory (H, buffer, etc.); R.Close( ) } Magda Balazinska - CSE 444, Fall 2010Magda Balazinska - CSE 444, Fall 2010 Nested Loop Joins • Tuple-based nested loop R ⋈ S • R is the outer relation, S is the inner relation • Cost: B(R) + T(R) B(S) • Not quite one-pass since S is read many times for each tuple r in R do for each tuple s in S do if r and s join then output (r,s) 18Magda Balazinska - CSE 444, Fall 2010 Page-at-a-time Refinement • Cost: B(R) + B(R)B(S) for each page of tuples r in R do for each page of tuples s in S do for all pairs of tuples if r and s join then output (r,s) 191 2 Nested Loop Example 20 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Input buffer for Patient Output buffer 2 2 Input buffer for Insurance 2 4Nested Loop Example 21 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Input buffer for Patient 1 2 Output buffer Input buffer for Insurance 4 3 1 2Nested Loop Example 22 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Input buffer for Patient 1 2 Output buffer Input buffer for Insurance 2 8 1 2 2 2 Cost: B(R) + B(R)B(S) Keep going until read all of Insurance Then repeat for next page of Patient… until end of PatientMagda Balazinska - CSE 444, Fall 2010 Sort-Merge Join Sort-merge join: R ⋈ S • Scan R and sort in main memory • Scan S and sort in main memory • Merge R and S • Cost: B(R) + B(S) • One pass algorithm when B(S) + B(R) <= M • Typically, this is NOT a one pass algorithm 23Sort-Merge Join Example 24 1 2 3 4 Patient 2 4 Insurance 4 3 8 5 9 6 2 8 8 9 6 6 1 3 Disk Memory M = 21 pages 1 2 4 3 9 6 8 5 Step 1: Scan Patient and sort in memorySort-Merge Join Example 25 …

View Full Document