Quick Review of Apr 15 material Overflow definition why it happens solutions chaining double hashing Hash file performance loading factor search speed Hash Indices as separate from file organization Static vs Dynamic Hashing Extendable Hashing with a detailed example of how it works Today HW 3 due today HW 4 due Thursday April 24 next week Questions 12 11 12 12 12 13 12 16 Today indices on multiple attributes grid files bitmap indices Index Definition in SQL Create an index create index index name on relation name attribute list e g create index br index on branch branch name Use create unique index to indirectly specify and enforce the condition that the search key is a candidate key not really required if SQL unique integrity constraint is supported To drop an index drop index index name Multiple Key Access With some queries we can use multiple indices Example select account number from account where branch name Perryridge and balance 1000 Possible strategies for processing this query using indices on single attributes use index on balance to find accounts with balances 1000 then test them individually to see if branch name Perryridge use index on branch name to find accounts with branchname Perryridge then test them individually to see if balances 1000 use branch name index to find pointers to all records of the Perryridge branch and use balance index similarly then take intersection of both sets of pointers Multiple Key Access 2 With some queries using a single attribute index is unnecessarily expensive with methods 1 and 2 from the earlier slide we might have the index we use return a very large set even though the final result is quite small Even with method 3 use both indices and then find the intersection we might have both indices return a large set which will make for a lot of unneeded work if the final result the intersection is small An alternative strategy is to create and use an index on more than one attribute in this example an index on branch name balance both attributes Indices on Multiple Attributes Suppose we have an ordered index on the combined search key branch name balance Examining the earlier query with the clause where branch name Perryridge and balance 1000 Our new index will fetch only records that satisfy both conditions much more efficient than trying to answer the query with separate single valued indices We can also handle range queries like where branch name Perryridge and balance 1000 But our index will not work for where branch name Perryridge and balance 1000 Multi Attribute Indexing Example EMP eno ename age sal lots of ways to handle this problem separate indices lots of false hits combined index based on composite key key sal 100 age search for 30 sal 40 translates into 3000 key 4000 easy search for 60 age 80 difficult Grid files up next R tree B tree generalization Quad trees K d trees etc Grid Files Structure used to speed processing of general multiple search key queries involving one or more comparison operators The grid file has a single grid array array has number of dimensions equal to the number of search key attributes one linear scale for each search key attribute Multiple cells of the grid array can point to the same bucket To find the bucket for a search key value locate the row and column of the cell using the linear scales to get the grid location then follow the pointer in that grid location to the bucket Example Grid File for account Queries on a Grid File A grid file on two attributes A and B can handle queries of all the following forms with reasonable efficiency a1 A a2 b1 B b2 a1 A a2 and b1 B b2 For example to answer a1 A a2 and b1 B b2 use the linear scales to find corresponding candidate grid array cells and look up all the buckets pointed to from those cells Grid Files cont During insertion if a bucket becomes full a new bucket can be created if more than one cell points to it idea similar to extendable hashing but in multiple dimensions if only one cell points to the bucket either an overflow bucket must be created or the grid array size must be increased Linear scales can be chosen to uniformly distribute records across buckets not necessary to have scale uniform across the domain if records are distributed in some other pattern the linear scale can mirror it as shown in the example earlier Grid Files end Periodic re organization to increase grid array size helps with good performance reorganization can be very expensive though Space overhead of the grid array can be high R trees chapter 23 are an alternative Bitmap Indices Bitmap indices are a special type of index designed for efficient queries on multiple keys Records in a relation are assumed to be numbered sequentially from 0 given a number n the objective is for it to be easy to retrieve record n very easy to achieve if we re looking at fixed length records Bitmap indices are applicable on attributes that take on a relatively small number of distinct values e g gender country state hockey team or an arbitrary mapping of a wider spectrum of values into a small number of categories e g income level divide income into a small number of levels such as 0 9999 10K 19999 20K 49999 50K and greater A bitmap is simply an array of bits Bitmap Indices cont In the simplest form a bitmap index on an attribute has a bitmap for each value of the attribute bitmap has as many bits as there are records in a bitmap for value v the bit for a record is 1 if the record has the value v for the attribute and 0 otherwise Bitmap Indices cont Bitmap indices are useful for queries on multiple attributes not particularly useful for single attribute queries Queries are answered using bitmap operations intersection and union or complementation not Each operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result bitmap e g 100110 and 110011 100010 100110 or 110011 110111 not 100110 011001 Bitmap Indices cont Every 1 bit in the result marks a desired tuple can compute its location since records are numbered sequentially and are all of the same size and retrieve the records counting number of tuples in the result SQL count aggregate is even faster Bitmap indices are generally very small compared to relation size e g if record is 100 bytes space for a single bitmap on all tuples of the relation takes 1 800 of the size of the relation itself if bitmap allows for 8 distinct values bitmap is only 1 of the size of the whole relation Bitmap
View Full Document
Unlocking...