CORNELL CS 432 - Evaluation of Relational Operations: Other Techniques

Unformatted text preview:

Simple Selections Evaluation of Relational Operations Other Techniques Chapter 14 Part B Database Management Systems R Ramakrishnan and Johannes Gehrke 1 Database Management Systems R Ramakrishnan and Johannes Gehrke Cost of finding qualifying data entries typically small plus cost of retrieving records could be large w o clustering In example assuming uniform distribution of names about 10 of tuples qualify 100 pages 10000 tuples With a clustered index cost is little more than 100 I Os if unclustered upto 10000 I Os Important refinement for unclustered indexes 1 Find qualifying data entries 2 Sort the rid s of the data records to be retrieved 3 Fetch rids in order This ensures that each data page is looked at just once though of such pages likely to be higher than with clustering Database Management Systems R Ramakrishnan and Johannes Gehrke 3 Most selective access path An index or file scan that we estimate will require the fewest page I Os Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuples pages fetched Consider day 8 9 94 AND bid 5 AND sid 3 A B tree index on day can be used then bid 5 and sid 3 must be checked for each retrieved tuple Similarly a hash index on bid sid could be used day 8 9 94 must then be checked Database Management Systems R Ramakrishnan and Johannes Gehrke Index on a b c matches a 5 AND b 3 but not b 3 Database Management Systems R Ramakrishnan and Johannes Gehrke 4 Intersection of Rids First approach Find the most selective access path retrieve tuples using it and apply any remaining terms that don t match the index day 8 9 94 AND rname Paul OR bid 5 OR sid 3 Such selection conditions are first converted to conjunctive normal form CNF day 8 9 94 OR bid 5 OR sid 3 AND rname Paul OR bid 5 OR sid 3 We only discuss the case with no ORs a conjunction of terms of the form attr op value An index matches a conjunction of terms that involve only attributes in a prefix of the search key Two Approaches to General Selections 2 General Selection Conditions Cost depends on qualifying tuples and clustering Reserves R R rname C Of the form R a ttr o p v a lu e R Size of result approximated as size of R reduction factor we will consider how to estimate reduction factors later With no index unsorted Must essentially scan the whole relation cost is M pages in R With an index on selection attribute Use index to find qualifying data entries then retrieve corresponding data records Hash index useful only for equality selections Using an Index for Selections SELECT FROM WHERE 5 Second approach if we have 2 or more matching indexes that use Alternatives 2 or 3 for data entries Get sets of rids of data records using each matching index Then intersect these sets of rids we ll discuss intersection soon Retrieve the records and apply any remaining terms Consider day 8 9 94 AND bid 5 AND sid 3 If we have a B tree index on day and an index on sid both using Alternative 2 we can retrieve rids of records satisfying day 8 9 94 using the first rids of recs satisfying sid 3 using the second intersect retrieve records and check bid 5 Database Management Systems R Ramakrishnan and Johannes Gehrke 6 The Projection Operation An approach based on sorting Projection Based on Hashing SELECT DISTINCT FROM R sid R bid Reserves R Modify Pass 0 of external sort to eliminate unwanted fields Thus runs of about 2B pages are produced but tuples in runs are smaller than input tuples Size ratio depends on and size of fields that are dropped Modify merging passes to eliminate duplicates Thus number of result tuples smaller than input Difference depends on of duplicates Cost In Pass 0 read original relation size M write out same number of smaller tuples In merging passes fewer tuples written out in each pass Using Reserves example 1000 input pages reduced to 250 in Pass 0 if size ratio is 0 25 Database Management Systems R Ramakrishnan and Johannes Gehrke 7 Partitioning phase Read R using one input buffer For each tuple discard unwanted fields apply hash function h1 to choose one of B 1 output buffers Duplicate elimination phase For each partition read it and build an in memory hash table using hash fn h2 h1 on all fields while discarding duplicates Result is B 1 partitions of tuples with no unwanted fields 2 tuples from different partitions guaranteed to be distinct If partition does not fit in memory can apply hash based projection algorithm recursively to this partition Cost For partitioning read R write out each tuple but with fewer fields This is read in next phase Database Management Systems R Ramakrishnan and Johannes Gehrke Discussion of Projection Set Operations Sort based approach is the standard better handling of skew and result is sorted If an index on the relation contains all wanted attributes in its search key can do index only scan Intersection and cross product special cases of join Union Distinct and Except similar we ll do union Sorting based approach to union Apply projection techniques to data entries much smaller If an ordered i e tree index contains all wanted attributes as prefix of search key can do even better 9 Sort both relations on combination of all attributes Scan sorted relations and merge them Alternative Merge runs from Pass 0 for both relations Hash based approach to union Retrieve data entries in order index only scan discard unwanted fields compare adjacent tuples to check for duplicates Database Management Systems R Ramakrishnan and Johannes Gehrke Partition R and S using hash function h For each S partition build in memory hash table using h2 scan corr R partition and add tuples to table while discarding duplicates Database Management Systems R Ramakrishnan and Johannes Gehrke Aggregate Operations AVG MIN etc In general requires scanning the relation Given index whose search key includes all attributes in the SELECT or WHERE clauses can do index only scan With grouping Sort on group by attributes then scan relation and compute aggregate for each group Can improve upon this by combining sorting and aggregate computation Similar approach based on hashing on group by attributes Given tree index whose search key includes all attributes in SELECT WHERE and GROUP BY clauses can do index only scan if group by attributes form prefix of search key can retrieve data entries tuples in group by order Database Management Systems R Ramakrishnan and Johannes Gehrke 10 Impact of


View Full Document

CORNELL CS 432 - Evaluation of Relational Operations: Other Techniques

Download Evaluation of Relational Operations: Other Techniques
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evaluation of Relational Operations: Other Techniques and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evaluation of Relational Operations: Other Techniques 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?