NCSU CSC 440 - Indexing - D2181074

Home> Schools> North Carolina State University> Computer Science (CSC) > CSC 440> Indexing

NCSU CSC 440 - Indexing

School name North Carolina State University

Course Csc 440- Database Management Systems

Pages 22

Download Save

Unformatted text preview:

1Chapter 13 1Chapter 13:Indexing(Slides by Hector Garcia-Molina,http://www-db.stanford.edu/~hector/cs245/notes.htm)Chapter 13 2Indexing & HashingvalueChapter 13?valuerecordChapter 13 3Topics• Conventional indexes• B-trees• Hashing schemes (self-study)2Chapter 13 4Sequential File201040306050807010090Chapter 13 5Sequential File201040306050807010090Dense Index102030405060708090100110120Chapter 13 6Sequential File201040306050807010090Sparse Index10305070901101301501701902102303Chapter 13 7Sequential File201040306050807010090Sparse 2nd level10305070901101301501701902102301090170250330410490570Chapter 13 8Question:• Can we build a dense, 2nd level indexfor a dense index?Chapter 13 9Notes on pointers:(1) Block pointer (sparse index) can be smaller than record pointerBPRP4Chapter 13 10(2) If file is contiguous, then we can omitpointers (i.e., compute them)Notes on pointers:Chapter 13 11K1K3K4K2R1R2R3R4say:1024 Bper block• if we want K3 block: get it at offset (3-1)1024 = 2048 bytesChapter 13 12Sparse vs. Dense Tradeoff• Sparse: Less index space per record can keep more of index in memory• Dense: Can tell if any record exists without accessing file(Also:– sparse better for insertions– dense needed for secondary indexes)5Chapter 13 13Terms• Index sequential file• Search key ( ≠ primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level indexChapter 13 14Next:• Duplicate keys• Deletion/Insertion• Secondary indexesChapter 13 15Duplicate keys101020103020303045406Chapter 13 16101020103020303045401010102020303030101020103020303045401010102020303030Dense index, one way to implement?Duplicate keysChapter 13 171010201030203030454010203040Dense index, better way?Duplicate keysChapter 13 181010201030203030454010102030Sparse index, one way?Duplicate keyscareful if lookingfor 20 or 30!7Chapter 13 191010201030203030454010203030Sparse index, another way?Duplicate keys– place first new key from blockshouldthis be40?Chapter 13 20Duplicate values, primary index• Index may point to first instance ofeach value onlyFile IndexSummaryaaab..Chapter 13 21Deletion from sparse index201040306050807010305070901101301508Chapter 13 22Deletion from sparse index20104030605080701030507090110130150– delete record 40Chapter 13 23Deletion from sparse index20104030605080701030507090110130150– delete record 304040Chapter 13 24Deletion from sparse index20104030605080701030507090110130150– delete records 30 & 4050709Chapter 13 25Deletion from dense index20104030605080701020304050607080Chapter 13 26Deletion from dense index20104030605080701020304050607080– delete record 304040Chapter 13 27Insertion, sparse index case2010305040601030406010Chapter 13 28Insertion, sparse index case20103050406010304060– insert record 3434• our lucky day! we have free space where we need it!Chapter 13 29Insertion, sparse index case20103050406010304060– insert record 1515203020• Illustrated: Immediate reorganization• Variation:– insert new block (chained file)– update indexChapter 13 30Insertion, sparse index case20103050406010304060– insert record 2525overflow blocks(reorganize later...)11Chapter 13 31Insertion, dense index case• Similar• Often more expensive . . . Chapter 13 32Secondary indexesSequencefield503070204080101006090Chapter 13 33Secondary indexesSequencefield503070204080101006090• Sparse index30208010090...does not make sense!12Chapter 13 34Secondary indexesSequencefield503070204080101006090• Dense index10203040506070...105090...sparsehighlevelChapter 13 35With secondary indexes:• Lowest level is dense• Other levels are sparseAlso: Pointers are record pointers(not block pointers; not computed)Chapter 13 36Summary so far• Conventional index– Basic Ideas: sparse, dense, multi-level…– Duplicate Keys– Deletion/Insertion– Secondary indexes13Chapter 13 37Conventional indexesAdvantage:- Simple- Index is sequential filegood for scansDisadvantage:- Inserts expensive, and/or- Lose sequentiality & balanceChapter 13 38Outline:• Conventional indexes• B-Trees ⇒ NEXT• Hashing schemes (self-study)Chapter 13 39• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”14Chapter 13 40RootB+Tree Example n=31001201501803035113035100101110120130150156179180200Chapter 13 41Sample non-leafto keys to keys to keys to keys< 57 57≤ k<81 81≤k<95 ≥95578195Chapter 13 42Sample leaf node:From non-leaf nodeto next leafin sequence578195To record with key 57To record with key 81To record with key 8515Chapter 13 43Size of nodes: n+1 pointersn keys(fixed)Chapter 13 44Don’t want nodes to be too empty• Use at leastNon-leaf: (n+1)/2 pointersLeaf: (n+1)/2 pointers to dataChapter 13 45Full node min. nodeNon-leafLeafn=31201501803035113035counts even if null16Chapter 13 46B+tree rules: tree of order n(1) All leaves are at the same lowest level(balanced tree)(2) Pointers in leaves point to records,except for “sequence pointer”Chapter 13 47(3) Number of pointers/keys for B+treeNon-leaf(non-root)n+1 n(n+1)/2 (n+1)/2- 1Leaf(non-root)n+1 nRoot n+1 n 1 1Max Max Min Min ptrs keys ptrs→data keys(n+1)/2 (n+1)/2Chapter 13 48Insert into B+tree(a) simple case– space available in leaf(b) leaf overflow(c) non-leaf overflow(d) new root17Chapter 13 49(a) Insert key = 32n=3351130313010032Chapter 13 50(a) Insert key = 7n=335113031301003577Chapter 13 51(c) Insert key = 160n=310012015018015015617918020016018016017918Chapter 13 52(d) New root, insert 45n=31020301231012202530324040454030new rootChapter 13 53(a) Simple case - no example(b) Coalesce with neighbor (sibling)(c) Re-distribute keys(d) Cases (b) or (c) at non-leafDeletion from B+treeChapter 13 54(b) Coalesce with sibling– Delete 5010401001020304050n=44019Chapter 13 55(c) Redistribute keys– Delete 501040100102030354050n=43535Chapter 13 56404530372526202210141310203040(d) Non-leaf coalese– Delete 37n=440302525new rootChapter 13 57B+tree deletions in practice– Often, coalescing is not implemented– Too hard and not worth it!20Chapter 13 58Variation on B+tree: B-tree (no +)• Idea:– Avoid duplicate keys– Have record pointers in non-leaf nodesChapter 13 59 to record to record to record with K1 with K2 with K3 to keys to keys to keys to keys < K1

View Full Document


School:
Email:
New Password:
Confirm Password:

NCSU CSC 440 - Indexing

Sign up for free to view:

Please select your school