1IndexingCPS 116Introduction to Database Systems2Announcements (November 8) Homework #3 sample solution available Project milestone #2 due this Thursday Platform, production dataset, and performance tuning3Basics Given a value, locate the record(s) with this valueSELECT * FROM R WHERE A = value;SELECT * FROM R, S WHERE R.A = S.B; Other search criteria, e.g. Range searchSELECT * FROM R WHERE A > value; Keyword searchdatabase indexing Search24Dense and sparse indexes Dense: one index entry for each search key value Sparse: one index entry for each block Records must be clustered according to the search key123 Milhouse 10 3.1142 Bart 10 2.3279 Jessica 10 4345 Martin 8 2.3456 Ralph 8 2.3512 Nelson 10 2.1679 Sherri 10 3.3697 Terri 10 3.3857 Lisa 8 4.3912 Windel 8 3.1123456875Sparse indexon SIDBartJessicaLisaMartinMilhouseNelsonRalphSherriTerriWindelDense indexon name5Dense versus sparse indexes Index size Sparse index is smaller Requirement on records Records must be clustered for sparse index Lookup Sparse index is smaller and may fit in memory Dense index can directly tell if a record exists Update Easier for sparse index6Primary and secondary indexes Primary index Created for the primary key of a table Records are usually clustered according to the primary key Can be sparse Secondary index Usually dense SQL PRIMARY KEY declaration automatically creates a primary index, UNIQUE key automatically creates a secondary index Additional secondary index can be created on non-key attribute(s)CREATE INDEX StudentGPAIndex ON Student(GPA);37ISAM What if an index is still too big? Put a another (sparse) index on top of that!)ISAM (Index Sequential Access Method), more or less100, 200, …, 901100, 123, …, 192 901, …, 996…Index blocks200, … 100, 108,119, 121123, 129,…901, 907,…996, 997,…………Data blocks192, 197,…200, 202,…Example: look up 1978Updates with ISAM Overflow chains and empty data blocks degrade performance Worst case: most records go into one long chainExample: insert 107107Overflow blockExample: delete 129100, 200, …, 901100, 123, …, 192 901, …, 996…Index blocks200, … 100, 108,119, 121123, 129,…901, 907,…996, 997,…………Data blocks192, 197,…200, 202,…9B+-tree3511303510010111012013015015617918020030100120150180Max fan-out: 4 A hierarchy of intervals Balanced (more or less): good performance guarantee Disk-based: one node per block; large fan-out410Sample B+-tree nodesMax fan-out: 4120150180to keys 100 · k < 120to keys120 · k < 150to keys150 · k < 180to keys180 · kNon-leaf120130to records with these k values;or, store records directly in leavesto next leaf node in sequenceLeafto keys100 · k11B+-tree balancing properties Height constraint: all leaves at the same lowest level Fan-out constraint: all nodes at least half full (except root)Max # Max # Min # Min #pointers keys active pointers keysNon-leaf ff–1 d f / 2 edf / 2 e –1Root ff–1 2 1Leaf ff–1 b f / 2 cbf / 2 c12LookupsSELECT * FROM R WHERE k = 179;3511303510010111012013015015617918020030100120150180Max fan-out: 4179Not foundSELECT * FROM R WHERE k = 32;513Range querySELECT * FROM R WHERE k > 32 AND k < 179;3511303510010111012013015015617918020030100120150180Max fan-out: 4100101110120130150156Look up 32…And follow next-leaf pointers3514Insertion Insert a record with search key value 323511303510010111012013015015617918020030100120150180Max fan-out: 4Look up where theinserted keyshould go…32And insert it right there15Another insertion example Insert a record with search key value 152100101110120130150156179180200100120150180Max fan-out: 4152Oops, node is already full!616Node splitting120130150152156179180200100120150180Max fan-out: 4Yikes, this node isalso already full!10010111015617More node splitting In the worst case, node splitting can “propagate” all the way up to the root of the tree (not illustrated here) Splitting the root introduces a new root of fan-out 2 and causes the tree to grow “up” by one level120130150152156179180200100180Max fan-out: 410010111012015015618Deletion Delete a record with search key value 130100101110120130150156179180200100120150180Max fan-out: 4Look up the keyto be deleted…And delete itOops, node is too empty!If a sibling has morethan enough keys,steal one!719Stealing from a sibling100101110120150156179180200100120150180Max fan-out: 4156Remember to fix the keyin the least common ancestor20Another deletion example Delete a record with search key value 179100101110120150156179180200100120156180Max fan-out: 4Cannot steal from siblingsThen coalesce (merge) with a sibling!21Coalescing100101110120150100120156180Max fan-out: 4Remember to delete theappropriate key from parent Deletion can “propagate” all the way up to the root of the tree (not illustrated here) When the root becomes empty, the tree “shrinks” by one level156180200822Performance analysis How many I/O’s are required for each operation? h, the height of the tree (more or less) Plus one or two to manipulate actual records Plus O(h) for reorganization (should be very rare if f is large) Minus one if we cache the root in memory How big is h? Roughly logfan-outN, where N is the number of records B+-tree properties guarantee that fan-out is least f / 2 for all non-root nodes Fan-out is typically large (in hundreds)—many keys and pointers can fit into one block A 4-level B+-tree is enough for typical tables23B+-tree in practice Complex reorganization for deletion often is not implemented (e.g., Oracle, Informix) Leave nodes less than half full and periodically reorganize Most commercial DBMS use B+-tree instead of hashing-based indexes because B+-tree handles range queries24The Halloween Problem Story from the early days of System R…UPDATE PayrollSET salary = salary * 1.1WHERE salary >= 100000; There is a B+-tree index on Payroll(salary) The update never stopped (why?) Solutions?925B+-tree versus ISAM26B+-tree versus B-tree B-tree: why not store records (or record pointers) in non-leaf nodes? These records can be accessed with fewer I/O’s Problems?27Beyond ISAM, B-, and B+-trees Other tree-based indexs: R-trees and variants, GiST, etc. Hashing-based indexes: extensible hashing, linear hashing, etc. Text indexes: inverted-list index, suffix arrays, etc. Other
View Full Document