1CMU SCS15-826: Multimedia Databasesand Data MiningMulti-key and Spatial Access Methods - IC. Faloutsos15-826 Copyright: C. Faloutsos (2005) 2CMU SCSOutlineGoal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining15-826 Copyright: C. Faloutsos (2005) 3CMU SCSIndexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods• text• ...15-826 Copyright: C. Faloutsos (2005) 4CMU SCSSec. key indexing• attributes w/ duplicates (eg., EMPLOYEES,with ‘job-code’)• Query types:– exact match– partial match• ‘job-code’ = ‘PGM’ and ‘dept’ =‘R&D’– range queries• ‘job-code’ =‘ADMIN’ and salary < 50K15-826 Copyright: C. Faloutsos (2005) 5CMU SCSSec. key indexing• Query types - cont’ d– boolean• ‘job-code’ =‘ADMIN’ or salary>20K– nn• salary ~ 30K15-826 Copyright: C. Faloutsos (2005) 6CMU SCSSolution?215-826 Copyright: C. Faloutsos (2005) 7CMU SCSSolution?• Inverted indices (usually, w/ B-trees)• Q: how to handle duplicates?Name Job-code Salary DeptSmith PGM 70 R&DJones ADMIN 50 R&D….Tomson ENG 50 SALESsalary-index507015-826 Copyright: C. Faloutsos (2005) 8CMU SCSSolution• A#1: eg., with postings listsName Job-code Salary DeptSmith PGM 70 R&DJones ADMIN 50 R&D….Tomson ENG 50 SALESsalary-index5070postings lists15-826 Copyright: C. Faloutsos (2005) 9CMU SCSSolution• A#2: modify B-tree code, to handle dup’ sName Job-code Salary DeptSmith PGM 70 R&DJones ADMIN 50 R&D….Tomson ENG 50 SALESsalary-index50705015-826 Copyright: C. Faloutsos (2005) 10CMU SCSHow to handle BooleanQueries?Name Job-code Salary DeptSmith PGM 70 R&DJones ADMIN 50 R&D….Tomson ENG 50 SALESsalary-index507050• eg., ‘sal=50 AND job-code=PGM’ ?15-826 Copyright: C. Faloutsos (2005) 11CMU SCSHow to handle BooleanQueries?Name Job-code Salary DeptSmith PGM 70 R&DJones ADMIN 50 R&D… .Tomson ENG 50 SALESsalary-index507050– from indices, find lists of qual. record-ids– merge lists (or check real records)15-826 Copyright: C. Faloutsos (2005) 12CMU SCSSec. key indexing• easily solved in commercial DBMS:create index s al-index on E MPL OYE E(salary);select * from E MPL OYE Ewhere salary > 50 andjob-code = ‘ADMIN’315-826 Copyright: C. Faloutsos (2005) 13CMU SCSSec. key indexing• can create combined indices:create index s j on E MP L OYE E ( s alary,job-code);15-826 Copyright: C. Faloutsos (2005) 14CMU SCSIndexing - Detailed outline• primary key indexing• secondary key / multi-key indexing– main memory: quad-trees– main memory: k-d-trees• spatial access methods• text• ...15-826 Copyright: C. Faloutsos (2005) 15CMU SCSQuad-trees• problem: find cities within 100mi fromPittsburgh• assumption: all fit in main memory• Q: how to answer such queries quickly?15-826 Copyright: C. Faloutsos (2005) 16CMU SCSQuad-trees• A: recursive decomposition of space, e.g.:PGHATLPHL15-826 Copyright: C. Faloutsos (2005) 17CMU SCSQuad-trees• A: recursive decomposition of space, e.g.:PGHATLPHL(30,10)3010SW15-826 Copyright: C. Faloutsos (2005) 18CMU SCSQuad-trees• A: recursive decomposition of space, e.g.:PGHATLPHL(30,10)3010SW204040,20NE415-826 Copyright: C. Faloutsos (2005) 19CMU SCSQuad-trees - search?• find cities with (35<x<45, 15<y<25):PGHATLPHL(30,10)3010SW204040,20NE15-826 Copyright: C. Faloutsos (2005) 20CMU SCSQuad-trees - search?• find cities with (35<x<45, 15<y<25):PGHATLPHL(30,10)3010SW204040,20NE15-826 Copyright: C. Faloutsos (2005) 21CMU SCSQuad-trees - search?• pseudocode:range-query( tree-ptr, range)if (tree-ptr == NULL) exit;if (tree-ptr->point within range){print tree-ptr->point}for each quadrant {if ( range intersects quadrant ) { range-query( tree-ptr->quadrant-ptr, range); }15-826 Copyright: C. Faloutsos (2005) 22CMU SCSQuad-trees - k-nn search?• k-nearest neighbor algo - more complicated:– find ‘good’ neighbors and put them in a stack– go to the most promising quadrant, and update thestack of neighbors– until we hit the leaves15-826 Copyright: C. Faloutsos (2005) 23CMU SCSQuad-trees - discussion• great for 2- and 3-d spaces• several variations, like fixed decomposition:PGHATLPHLPGHATLPHL‘adaptive’‘fixed’middle15-826 Copyright: C. Faloutsos (2005) 24CMU SCSQuad-trees - discussion• but: unsuitable for higher-d spaces (why?)515-826 Copyright: C. Faloutsos (2005) 25CMU SCSQuad-trees - discussion• but: unsuitable for higher-d spaces (why?)• A: 2^d pointers, per node!• Q: how to solve this problem?• A: k-d-trees!15-826 Copyright: C. Faloutsos (2005) 26CMU SCSIndexing - Detailed outline• primary key indexing• secondary key / multi-key indexing– main memory: quad-trees– main memory: k-d-trees• spatial access methods• text• ...15-826 Copyright: C. Faloutsos (2005) 27CMU SCSk-d-trees• Binary trees, with alternating ‘discriminators’PGHATLPHL(30,10)3010SWquad-tree15-826 Copyright: C. Faloutsos (2005) 28CMU SCSk-d-trees• Binary trees, with alternating ‘discriminators’PGHATLPHL(30,10)3010Wk-d-treeE15-826 Copyright: C. Faloutsos (2005) 29CMU SCSk-d-trees• Binary trees, with alternating ‘discriminators’PGHATLPHL(30,10)3010x<=30x>30ATL15-826 Copyright: C. Faloutsos (2005) 30CMU SCSk-d-trees• Binary trees, with alternating ‘discriminators’PGHATLPHL3010(30,10)x<=30x>30ATL2040(40,20)y<=20y>20PHLxy615-826 Copyright: C. Faloutsos (2005) 31CMU SCSIndexing - Detailed outline• primary key indexing• secondary key / multi-key indexing– main memory: quad-trees– main memory: k-d-trees• insertion; deletion• range query; k-nn query• spatial access methods• text• ...15-826 Copyright: C. Faloutsos (2005) 32CMU SCSk-d-trees - insertion• Binary trees, with alternating ‘discriminators’PGHATLPHL3010(30,10)x<=30x>30ATL2040(40,20)y<=20y>20PHLxy15-826 Copyright: C. Faloutsos (2005) 33CMU SCSk-d-trees - insertion• discriminators: may cycle, or ....• Q: which should we put first?PGHATLPHL3010(30,10)x<=30x>30ATL2040(40,20)y<=20y>20PHLxy15-826 Copyright: C. Faloutsos (2005) 34CMU SCSk-d-trees - deletion• Tricky! ‘delete-and-promote’ (or ‘mark asdeleted’ )PGHATLPHL3010(30,10)x<=30x>30ATL2040(40,20)y<=20y>20PHLxy15-826 Copyright: C. Faloutsos (2005) 35CMU SCSk-d-trees - range queryPGHATLPHL3010(30,10)x<=30x>30ATL2040(40,20)y<=20y>20PHL15-826 Copyright: C. Faloutsos (2005) 36CMU SCSk-d-trees - range query• similar to quad-trees: check the root;
View Full Document