DOC PREVIEW
Duke CPS 116 - Indexing

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1IndexingCPS 116Introduction to Database Systems2Announcements (November 8) Homework #3 sample solution available Project milestone #2 due this Thursday Platform, production dataset, and performance tuning3Basics Given a value, locate the record(s) with this valueSELECT * FROM R WHERE A = value;SELECT * FROM R, S WHERE R.A = S.B; Other search criteria, e.g. Range searchSELECT * FROM R WHERE A > value; Keyword searchdatabase indexing Search24Dense and sparse indexes Dense: one index entry for each search key value Sparse: one index entry for each block Records must be clustered according to the search key123 Milhouse 10 3.1142 Bart 10 2.3279 Jessica 10 4345 Martin 8 2.3456 Ralph 8 2.3512 Nelson 10 2.1679 Sherri 10 3.3697 Terri 10 3.3857 Lisa 8 4.3912 Windel 8 3.1123456875Sparse indexon SIDBartJessicaLisaMartinMilhouseNelsonRalphSherriTerriWindelDense indexon name5Dense versus sparse indexes Index size Sparse index is smaller Requirement on records Records must be clustered for sparse index Lookup Sparse index is smaller and may fit in memory Dense index can directly tell if a record exists Update Easier for sparse index6Primary and secondary indexes Primary index Created for the primary key of a table Records are usually clustered according to the primary key Can be sparse Secondary index Usually dense SQL PRIMARY KEY declaration automatically creates a primary index, UNIQUE key automatically creates a secondary index Additional secondary index can be created on non-key attribute(s)CREATE INDEX StudentGPAIndex ON Student(GPA);37ISAM What if an index is still too big? Put a another (sparse) index on top of that!)ISAM (Index Sequential Access Method), more or less100, 200, …, 901100, 123, …, 192 901, …, 996…Index blocks200, … 100, 108,119, 121123, 129,…901, 907,…996, 997,…………Data blocks192, 197,…200, 202,…Example: look up 1978Updates with ISAM Overflow chains and empty data blocks degrade performance Worst case: most records go into one long chainExample: insert 107107Overflow blockExample: delete 129100, 200, …, 901100, 123, …, 192 901, …, 996…Index blocks200, … 100, 108,119, 121123, 129,…901, 907,…996, 997,…………Data blocks192, 197,…200, 202,…9B+-tree3511303510010111012013015015617918020030100120150180Max fan-out: 4 A hierarchy of intervals Balanced (more or less): good performance guarantee Disk-based: one node per block; large fan-out410Sample B+-tree nodesMax fan-out: 4120150180to keys 100 · k < 120to keys120 · k < 150to keys150 · k < 180to keys180 · kNon-leaf120130to records with these k values;or, store records directly in leavesto next leaf node in sequenceLeafto keys100 · k11B+-tree balancing properties Height constraint: all leaves at the same lowest level Fan-out constraint: all nodes at least half full (except root)Max # Max # Min # Min #pointers keys active pointers keysNon-leaf ff–1 d f / 2 edf / 2 e –1Root ff–1 2 1Leaf ff–1 b f / 2 cbf / 2 c12LookupsSELECT * FROM R WHERE k = 179;3511303510010111012013015015617918020030100120150180Max fan-out: 4179Not foundSELECT * FROM R WHERE k = 32;513Range querySELECT * FROM R WHERE k > 32 AND k < 179;3511303510010111012013015015617918020030100120150180Max fan-out: 4100101110120130150156Look up 32…And follow next-leaf pointers3514Insertion Insert a record with search key value 323511303510010111012013015015617918020030100120150180Max fan-out: 4Look up where theinserted keyshould go…32And insert it right there15Another insertion example Insert a record with search key value 152100101110120130150156179180200100120150180Max fan-out: 4152Oops, node is already full!616Node splitting120130150152156179180200100120150180Max fan-out: 4Yikes, this node isalso already full!10010111015617More node splitting In the worst case, node splitting can “propagate” all the way up to the root of the tree (not illustrated here) Splitting the root introduces a new root of fan-out 2 and causes the tree to grow “up” by one level120130150152156179180200100180Max fan-out: 410010111012015015618Deletion Delete a record with search key value 130100101110120130150156179180200100120150180Max fan-out: 4Look up the keyto be deleted…And delete itOops, node is too empty!If a sibling has morethan enough keys,steal one!719Stealing from a sibling100101110120150156179180200100120150180Max fan-out: 4156Remember to fix the keyin the least common ancestor20Another deletion example Delete a record with search key value 179100101110120150156179180200100120156180Max fan-out: 4Cannot steal from siblingsThen coalesce (merge) with a sibling!21Coalescing100101110120150100120156180Max fan-out: 4Remember to delete theappropriate key from parent Deletion can “propagate” all the way up to the root of the tree (not illustrated here) When the root becomes empty, the tree “shrinks” by one level156180200822Performance analysis How many I/O’s are required for each operation? h, the height of the tree (more or less) Plus one or two to manipulate actual records Plus O(h) for reorganization (should be very rare if f is large) Minus one if we cache the root in memory How big is h? Roughly logfan-outN, where N is the number of records B+-tree properties guarantee that fan-out is least f / 2 for all non-root nodes  Fan-out is typically large (in hundreds)—many keys and pointers can fit into one block A 4-level B+-tree is enough for typical tables23B+-tree in practice Complex reorganization for deletion often is not implemented (e.g., Oracle, Informix) Leave nodes less than half full and periodically reorganize Most commercial DBMS use B+-tree instead of hashing-based indexes because B+-tree handles range queries24The Halloween Problem Story from the early days of System R…UPDATE PayrollSET salary = salary * 1.1WHERE salary >= 100000; There is a B+-tree index on Payroll(salary) The update never stopped (why?) Solutions?925B+-tree versus ISAM26B+-tree versus B-tree B-tree: why not store records (or record pointers) in non-leaf nodes? These records can be accessed with fewer I/O’s Problems?27Beyond ISAM, B-, and B+-trees Other tree-based indexs: R-trees and variants, GiST, etc.  Hashing-based indexes: extensible hashing, linear hashing, etc. Text indexes: inverted-list index, suffix arrays, etc. Other


View Full Document

Duke CPS 116 - Indexing

Documents in this Course
Part I

Part I

8 pages

XSLT

XSLT

4 pages

XSLT

XSLT

8 pages

Part I

Part I

8 pages

XSLT

XSLT

8 pages

Load more
Download Indexing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Indexing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Indexing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?