1CMU SCS15-826: Multimedia Databases and Data MiningSpatial Access Methods - IVC. Faloutsos15-826 Copyright: C. Faloutsos (2005) #2CMU SCSOutlineGoal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining15-826 Copyright: C. Faloutsos (2005) #3CMU SCSIndexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods– problem dfn– z-ordering– R-trees– misc• text• ...15-826 Copyright: C. Faloutsos (2005) #4CMU SCSSAMs - Detailed outline• spatial access methods– problem dfn– z-ordering– R-trees– misc topics• grid files• dimensionality curse; dim. reduction• metric trees• other nn methods• text, ...15-826 Copyright: C. Faloutsos (2005) #5CMU SCSGrid files• problem: spatial queries in k-d point-sets• Main idea: try to generalize hashing to k-d• (how?)15-826 Copyright: C. Faloutsos (2005) #6CMU SCSGrid files• A: put a grid• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.• Q: details?215-826 Copyright: C. Faloutsos (2005) #7CMU SCSGrid files• cuts: all the way through• cuts: at ½, ¾, ¼ etc; but ondemand• each cell -> disk page15-826 Copyright: C. Faloutsos (2005) #8CMU SCSGrid filesThus, we only need:- cut-points for each axis- k-d directory½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #9CMU SCSGrid filesSearch ( for exact match) – eg., (0.3; 0.3)½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #10CMU SCSGrid filesSearch ( for exact match) – eg., (0.3; 0.3)½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #11CMU SCSGrid files• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.X15-826 Copyright: C. Faloutsos (2005) #12CMU SCSGrid filespartial match – eg., 0<x<0.3½¼½¼½½x-cutsy-cuts315-826 Copyright: C. Faloutsos (2005) #13CMU SCSGrid filespartial match – eg., 0<x<0.3½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #14CMU SCSGrid filesexactly the symmetric algo for eg., 0<y<0.315-826 Copyright: C. Faloutsos (2005) #15CMU SCSGrid files• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.XX15-826 Copyright: C. Faloutsos (2005) #16CMU SCSGrid filesQ: How to split an overflowingpage?½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #17CMU SCSGrid filesA: pick the ‘best’ axis, and cutall the way through½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #18CMU SCSGrid filesA: pick the ‘best’ axis, and cutall the way through...½¼½¼3/8½x-cutsy-cuts3/8½415-826 Copyright: C. Faloutsos (2005) #19CMU SCSGrid files... updating the directoryappropriately (ouch!)½¼½¼3/8½x-cutsy-cuts3/8½15-826 Copyright: C. Faloutsos (2005) #20CMU SCSGrid files• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.XXX15-826 Copyright: C. Faloutsos (2005) #21CMU SCSGrid files• it meets the three goals• had follow-up work [twin grid files, multi-level; etc]• BUT: has some disadvantages (whichones?)15-826 Copyright: C. Faloutsos (2005) #22CMU SCSGrid files - disadvantages• #1: problems in high-d: directory splits canbe expensive15-826 Copyright: C. Faloutsos (2005) #23CMU SCSGrid files - disadvantages• #2: even in low-d, suffers on correlatedattributes:15-826 Copyright: C. Faloutsos (2005) #24CMU SCSGrid files - disadvantages• (Q: how to fix, for 2-d, linearly correlatedpoints?)515-826 Copyright: C. Faloutsos (2005) #25CMU SCSGrid files - disadvantages• (A1: rotate [Hinrichs+]; A2: triangular cells[Rego+])15-826 Copyright: C. Faloutsos (2005) #26CMU SCSGrid files - disadvantages• #3: how about region data?15-826 Copyright: C. Faloutsos (2005) #27CMU SCSGrid files - disadvantages• #3: how about region data?• if we ‘cut’ them, then we have O(volume)pieces (while z-ordering: O(surface))• what to do?15-826 Copyright: C. Faloutsos (2005) #28CMU SCSGrid files - disadvantages• what to do?• Translation to 2k – d points! (clever, BUT,still has subtle problems) E.g., 1-d ‘regions’A BCx-startx-end01½¼¾01½¼¾ABC15-826 Copyright: C. Faloutsos (2005) #29CMU SCSGrid files - disadvantages• what to do?• Translation to 2k – d points! (clever, BUT,still has subtle problems) E.g., 1-d ‘regions’A BCx-startx-end01½¼¾01½¼¾ABC15-826 Copyright: C. Faloutsos (2005) #30CMU SCSGrid files - disadvantages• what to do?• Translation to 2k – d points! (clever, BUT,still has subtle problems) E.g., 1-d ‘regions’A BCx-startx-end01½¼¾01½¼¾ABC615-826 Copyright: C. Faloutsos (2005) #31CMU SCSGrid files - disadvantages• what is the problem, then?15-826 Copyright: C. Faloutsos (2005) #32CMU SCSGrid files - disadvantages• what is the problem, then?• A: dimensionality curse; large query regions15-826 Copyright: C. Faloutsos (2005) #33CMU SCSGrid files – conclusions• works OK in low-d un-correlated points• but z-ordering/R-trees seem to work betterfor higher-d• smart idea to translate k-d rectangles into2*k - points (but: dim. curse)15-826 Copyright: C. Faloutsos (2005) #34CMU SCSSAMs - Detailed outline• spatial access methods– problem dfn– z-ordering– R-trees– misc topics• grid files• dimensionality curse; dim. reduction• metric trees• other nn methods• text, ...15-826 Copyright: C. Faloutsos (2005) #35CMU SCSDimensionality ‘curse’• Q: What is the problem in high-d?15-826 Copyright: C. Faloutsos (2005) #36CMU SCSDimensionality ‘curse’• Q: What is the problem in high-d?• A: indices do not seem to help, for manyqueries (eg., k-nn)– in high-d (& uniform distributions), most pointsare equidistant -> k-nn retrieves too many near-neighbors– [Yao & Yao, ’85]: search effort ~ O( N (1-1/d) )715-826 Copyright: C. Faloutsos (2005) #37CMU SCSDimensionality ‘curse’• (counter-intuitive, for db mentality)• Q: What to do, then?15-826 Copyright: C. Faloutsos (2005) #38CMU SCSDimensionality ‘curse’• A1: switch to seq. scanning• A2: dim. reduction• A3: consider the ‘intrinsic’ /fractaldimensionality• A4: find approximate nn15-826 Copyright: C. Faloutsos (2005) #39CMU SCSDimensionality ‘curse’• A1: switch to seq. scanning– X-trees [Kriegel+,
View Full Document