1CMU SCS15-826: Multimedia Databasesand Data MiningPrimary key indexing – B-treesChristos Faloutsos - CMUwww.cs.cmu.edu/~christos15-826 Copyright: C. Faloutsos (2005) 2CMU SCSProblemGiven a large collection of (multimedia)records, find similar/interesting things, ie:• Allow fast, approximate queries, and• Find rules/patterns15-826 Copyright: C. Faloutsos (2005) 3CMU SCSOutlineGoal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining15-826 Copyright: C. Faloutsos (2005) 4CMU SCSIndexing - Detailed outline• primary key indexing– B-trees and variants– (static) hashing– extendible hashing• secondary key indexing• spatial access methods• text• ...15-826 Copyright: C. Faloutsos (2005) 5CMU SCSPrimary key indexing• find employee with ssn=12315-826 Copyright: C. Faloutsos (2005) 6CMU SCSB-trees• the most successful family of indexschemes (B-trees, B+-trees, B*-trees)• Can be used for primary/secondary,clustering/non-clustering index.• balanced “n-way” search trees215-826 Copyright: C. Faloutsos (2005) 7CMU SCSB-treesEg., B-tree of order 3:1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 8CMU SCSB - tree properties:• each node, in a B-tree of order n:– Key order– at most n pointers– at least n/2 pointers (except root)– all leaves at the same level– if number of pointers is k, then node has exactly k-1keys– (leaves are empty)v1 v2… vn-1p1pn15-826 Copyright: C. Faloutsos (2005) 9CMU SCSProperties• “block aware” nodes: each node -> diskpage• O(log (N)) for everything! (ins/del/search)• typically, if m = 50 - 100, then 2 - 3 levels• utilization >= 50%, guaranteed; on average69%15-826 Copyright: C. Faloutsos (2005) 10CMU SCSQueries• Algo for exact match query? (eg., ssn=8?)1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 11CMU SCSQueries• Algo for exact match query? (eg., ssn=8?)1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 12CMU SCSQueries• Algo for exact match query? (eg., ssn=8?)1367913<6>6<9>9315-826 Copyright: C. Faloutsos (2005) 13CMU SCSQueries• Algo for exact match query? (eg., ssn=8?)1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 14CMU SCSQueries• Algo for exact match query? (eg., ssn=8?)1367913<6>6<9>9H steps (= diskaccesses)15-826 Copyright: C. Faloutsos (2005) 15CMU SCSQueries• what about range queries? (eg., 5<salary<8)• Proximity/ nearest neighbor searches? (eg.,salary ~ 8 )15-826 Copyright: C. Faloutsos (2005) 16CMU SCSQueries• what about range queries? (eg., 5<salary<8)• Proximity/ nearest neighbor searches? (eg.,salary ~ 8 )1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 17CMU SCSQueries• what about range queries? (eg., 5<salary<8)• Proximity/ nearest neighbor searches? (eg.,salary ~ 8 )1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 18CMU SCSB-trees: Insertion• Insert in leaf; on overflow, push middle up(recursively)• split: preserves B - tree properties415-826 Copyright: C. Faloutsos (2005) 19CMU SCSB-treesEasy case: Tree T0; insert ‘8’1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 20CMU SCSB-treesTree T0; insert ‘8’1367913<6>6<9>9815-826 Copyright: C. Faloutsos (2005) 21CMU SCSB-treesHardest case: Tree T0; insert ‘2’1367913<6>6<9>9215-826 Copyright: C. Faloutsos (2005) 22CMU SCSB-treesHardest case: Tree T0; insert ‘2’12679133push middle up15-826 Copyright: C. Faloutsos (2005) 23CMU SCSB-treesHardest case: Tree T0; insert ‘2’679131 322Ovf; push middle15-826 Copyright: C. Faloutsos (2005) 24CMU SCSB-treesHardest case: Tree T0; insert ‘2’79131 326Final state515-826 Copyright: C. Faloutsos (2005) 25CMU SCSB-trees: Insertion• Q: What if there are two middles? (eg, order4)• A: either one is fine15-826 Copyright: C. Faloutsos (2005) 26CMU SCSB-trees: Insertion• Insert in leaf; on overflow, push middle up(recursively – ‘propagate split’ )• split: preserves all B - tree properties (!!)• notice how it grows: height increases whenroot overflows & splits• Automatic, incremental re-organization15-826 Copyright: C. Faloutsos (2005) 27CMU SCSOverview• B – trees– Dfn, Search, insertion, deletion• B+ - trees• hashing15-826 Copyright: C. Faloutsos (2005) 28CMU SCSDeletionRough outline of algo:• Delete key;• on underflow, may need to mergeIn practice, some implementors just allow underflows tohappen…15-826 Copyright: C. Faloutsos (2005) 29CMU SCSB-trees – DeletionEasiest case: Tree T0; delete ‘3’1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 30CMU SCSB-trees – DeletionEasiest case: Tree T0; delete ‘3’167913<6>6<9>9615-826 Copyright: C. Faloutsos (2005) 31CMU SCSB-trees – Deletion• Case1: delete a key at a leaf – no underflow• Case2: delete non-leaf key – no underflow• Case3: delete leaf-key; underflow, and ‘richsibling’• Case4: delete leaf-key; underflow, and ‘poorsibling’15-826 Copyright: C. Faloutsos (2005) 32CMU SCSB-trees – Deletion• Case1: delete a key at a leaf – no underflow(delete 3 from T0)1367913<6>6<9>915-826 Copyright: C. Faloutsos (2005) 33CMU SCSB-trees – Deletion• Case2: delete a key at a non-leaf – nounderflow (eg., delete 6 from T0)1367913<6>6<9>9Delete &promote, ie:15-826 Copyright: C. Faloutsos (2005) 34CMU SCSB-trees – Deletion• Case2: delete a key at a non-leaf – nounderflow (eg., delete 6 from T0)137913<6>6<9>9Delete &promote, ie:15-826 Copyright: C. Faloutsos (2005) 35CMU SCSB-trees – Deletion• Case2: delete a key at a non-leaf – nounderflow (eg., delete 6 from T0)1 7913<6>6<9>9Delete &promote, ie:315-826 Copyright: C. Faloutsos (2005) 36CMU SCSB-trees – Deletion• Case2: delete a key at a non-leaf – nounderflow (eg., delete 6 from T0)1 7913<3>3<9>93FINAL TREE715-826 Copyright: C. Faloutsos (2005) 37CMU SCSB-trees – Deletion• Case2: delete a key at a non-leaf – nounderflow (eg., delete 6 from T0)• Q: How to promote?• A: pick the largest key from the left sub-tree(or the smallest from the right sub-tree)• Observation: every deletion eventuallybecomes a deletion of a leaf key15-826 Copyright: C. Faloutsos (2005) 38CMU SCSB-trees – Deletion• Case1: delete a key at a leaf – no underflow• Case2: delete non-leaf key – no underflow• Case3: delete leaf-key; underflow, and ‘richsibling’• Case4: delete leaf-key; underflow, and ‘poorsibling’15-826 Copyright: C. Faloutsos (2005) 39CMU SCSB-trees – Deletion• Case3: underflow & ‘rich sibling’
View Full Document