Unformatted text preview:

CMSC424 Database Design Instructor Amol Deshpande amol cs umd edu Databases Data Conceptual representation of the data Data Storage How where to store data how to access it Data Retrieval How to ask questions of the database How to answer those questions Data Models Integrity Manage crashes concurrency Manage semantic inconsistencies Query Processing Overview Selection operation Join operators Sorting Other operators Putting it all together Overview User select from R S where Resolve the references Syntax errors etc Converts the query to an internal format relational algebra like Query Parser Query Optimizer Query Processor R B Tree on R a S Hash Index on S a Results Find the best way to evaluate the query Which index to use What join method to use Read the data from the files Do the query processing joins selections aggregates Cost Complicated to compute We will focus on disk Number of I Os Not sufficient Number of seeks matters a lot why tT time to transfer one block tS time for one seek Cost for b block transfers plus S seeks b tT S tS Measured in seconds Query Processing Overview Selection operation Join operators Sorting Other operators Putting it all together Selection Operation select from person where SSN 123 Option 1 Sequential Scan Read the relation start to end and look for 123 Can always be used not true for the other options Cost Let br Number of relation blocks Then So 1 seek and br block transfers tS br tT sec Improvements If SSN is a key then can stop when found So on average br 2 blocks accessed Selection Operation select from person where SSN 123 Option 2 Binary Search Pre condition The relation is sorted on SSN Selection condition is an equality E g can t apply to Name like 424 Do binary search Cost of finding the first tuple that matches log2 br tT tS All I Os are random so need a seek for all The last few are closeby but we ignore such small effects Not quite What if 10000 tuples match the condition Incurs additional cost Selection Operation select from person where SSN 123 Option 3 Use Index Pre condition An appropriate index must exist Use the index Find the first leaf page that contains the search key Retrieve all the tuples that match by following the pointers If primary index the relation is sorted by the search key Go to the relation and read blocks sequentially If secondary index must follow all pointers using the index Selection w B Tree Indexes cost of finding the first leaf cost of retrieving the tuples primary index candidate key equality hi tT tS 1 tT tS primary index not a key equality hi tT tS 1 tT tS b 1 tT Note primary sorted b number of pages that contain the matches secondary index candidate key equality hi tT tS 1 tT tS secondary index not a key equality hi tT tS n tT tS n number of records that match This can be bad hi height of the index Selection Operation Selections involving ranges select from accounts where balance 100000 select from matches where matchdate between 10 20 06 and 10 30 06 Option 1 Sequential scan Option 2 Using an appropriate index Can t use hash indexes for this purpose Cost formulas Range queries equality on non key attributes So rows 3 and 5 in the preceding page Selection Operation Complex selections Conjunctive select from accounts where balance 100000 and SSN 123 Disjunctive select from accounts where balance 100000 or SSN 123 Option 1 Sequential scan Option 2 Conjunctive only Using an appropriate index on one of the conditions E g Use SSN index to evaluate SSN 123 Apply the second condtion to the tuples that match Or do the other way around if index on balance exists Which is better Option 3 Conjunctive only Choose a multi key index Not commonly available Selection Operation Complex selections Conjunctive select from accounts where balance 100000 and SSN 123 Disjunctive select from accounts where balance 100000 or SSN 123 Option 4 Conjunction or disjunction of record identifiers Use indexes to find all RIDs that match each of the conditions Do an intersection for conjunction or a union for disjunction Sort the records and fetch them in one shot Called Index ANDing or Index ORing Heavily used in commercial systems Query Processing Overview Selection operation Join operators Sorting Other operators Putting it all together Join select from R S where R a S a select from R S where R a S a 0 5 Called an equi join Not an equi join Option 1 Nested loops for each tuple r in R for each tuple s in S check if r a s a or whether r a s a 0 5 Can be used for any join condition As opposed to some algorithms we will see later R called outer relation S called inner relation Nested loops Join Cost Depends on the actual values of parameters especially memory br bs Number of blocks of R and S nr ns Number of tuples of R and S Case 1 Minimum memory required 3 blocks One to hold the current R block one for current S block one for the result being produced Blocks transferred Must scan R tuples once For each R tuple must scan S nr bs Seeks nr br br Nested loops Join Case 1 Minimum memory required 3 blocks Blocks transferred nr bs br Seeks nr br Example Number of records R nr 10 000 S ns 5000 Number of blocks R br 400 Then blocks transferred 10000 100 400 1 000 400 seeks 10400 What if we were to switch R and S S bs 100 2 000 100 block transfers 5100 seeks Matters Nested loops Join Case 2 S fits in memory Blocks transferred bs br Seeks 2 Example Number of records R nr 10 000 S ns 5000 Number of blocks R br 400 Then blocks transferred 400 100 500 seeks 2 This is orders of magnitude difference S bs 100 Block Nested loops Join Simple modification to nested loops join Block at a time for each block Br in R for each block Bs in S for each tuple r in Br for each tuple s in Bs check if r a s a or whether r a s a 0 5 Case 1 Minimum memory required 3 blocks Blocks transferred br bs br Seeks 2 br For the example blocks 40400 seeks 800 Block Nested loops Join Case 1 Minimum memory required 3 blocks Blocks transferred br bs br Seeks 2 br Case 2 S fits in memory Blocks transferred bs br Seeks 2 What about in between Say there are 50 blocks but S is 100 blocks Why not use all the memory that we can Block Nested loops Join Case 3 50 blocks S 100 blocks for each group of 48 blocks in R for each block Bs in S for each tuple r in the group of 48 blocks for each tuple s in Bs check if r a s a or whether r a s …


View Full Document

UMD CMSC 424 - Database Design

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Loading Unlocking...
Login

Join to view Database Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database Design and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?