DOC PREVIEW
CMU CS 15826 - Spatial Access Methods - IV

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMU SCS15-826: Multimedia Databases and Data MiningSpatial Access Methods - IVC. Faloutsos15-826 Copyright: C. Faloutsos (2005) #2CMU SCSOutlineGoal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining15-826 Copyright: C. Faloutsos (2005) #3CMU SCSIndexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods– problem dfn– z-ordering– R-trees– misc• text• ...15-826 Copyright: C. Faloutsos (2005) #4CMU SCSSAMs - Detailed outline• spatial access methods– problem dfn– z-ordering– R-trees– misc topics• grid files• dimensionality curse; dim. reduction• metric trees• other nn methods• text, ...15-826 Copyright: C. Faloutsos (2005) #5CMU SCSGrid files• problem: spatial queries in k-d point-sets• Main idea: try to generalize hashing to k-d• (how?)15-826 Copyright: C. Faloutsos (2005) #6CMU SCSGrid files• A: put a grid• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.• Q: details?215-826 Copyright: C. Faloutsos (2005) #7CMU SCSGrid files• cuts: all the way through• cuts: at ½, ¾, ¼ etc; but ondemand• each cell -> disk page15-826 Copyright: C. Faloutsos (2005) #8CMU SCSGrid filesThus, we only need:- cut-points for each axis- k-d directory½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #9CMU SCSGrid filesSearch ( for exact match) – eg., (0.3; 0.3)½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #10CMU SCSGrid filesSearch ( for exact match) – eg., (0.3; 0.3)½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #11CMU SCSGrid files• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.X15-826 Copyright: C. Faloutsos (2005) #12CMU SCSGrid filespartial match – eg., 0<x<0.3½¼½¼½½x-cutsy-cuts315-826 Copyright: C. Faloutsos (2005) #13CMU SCSGrid filespartial match – eg., 0<x<0.3½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #14CMU SCSGrid filesexactly the symmetric algo for eg., 0<y<0.315-826 Copyright: C. Faloutsos (2005) #15CMU SCSGrid files• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.XX15-826 Copyright: C. Faloutsos (2005) #16CMU SCSGrid filesQ: How to split an overflowingpage?½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #17CMU SCSGrid filesA: pick the ‘best’ axis, and cutall the way through½¼½¼½½x-cutsy-cuts15-826 Copyright: C. Faloutsos (2005) #18CMU SCSGrid filesA: pick the ‘best’ axis, and cutall the way through...½¼½¼3/8½x-cutsy-cuts3/8½415-826 Copyright: C. Faloutsos (2005) #19CMU SCSGrid files... updating the directoryappropriately (ouch!)½¼½¼3/8½x-cutsy-cuts3/8½15-826 Copyright: C. Faloutsos (2005) #20CMU SCSGrid files• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exactmatch queries– adaptive to non-uniform distr.XXX15-826 Copyright: C. Faloutsos (2005) #21CMU SCSGrid files• it meets the three goals• had follow-up work [twin grid files, multi-level; etc]• BUT: has some disadvantages (whichones?)15-826 Copyright: C. Faloutsos (2005) #22CMU SCSGrid files - disadvantages• #1: problems in high-d: directory splits canbe expensive15-826 Copyright: C. Faloutsos (2005) #23CMU SCSGrid files - disadvantages• #2: even in low-d, suffers on correlatedattributes:15-826 Copyright: C. Faloutsos (2005) #24CMU SCSGrid files - disadvantages• (Q: how to fix, for 2-d, linearly correlatedpoints?)515-826 Copyright: C. Faloutsos (2005) #25CMU SCSGrid files - disadvantages• (A1: rotate [Hinrichs+]; A2: triangular cells[Rego+])15-826 Copyright: C. Faloutsos (2005) #26CMU SCSGrid files - disadvantages• #3: how about region data?15-826 Copyright: C. Faloutsos (2005) #27CMU SCSGrid files - disadvantages• #3: how about region data?• if we ‘cut’ them, then we have O(volume)pieces (while z-ordering: O(surface))• what to do?15-826 Copyright: C. Faloutsos (2005) #28CMU SCSGrid files - disadvantages• what to do?• Translation to 2k – d points! (clever, BUT,still has subtle problems) E.g., 1-d ‘regions’A BCx-startx-end01½¼¾01½¼¾ABC15-826 Copyright: C. Faloutsos (2005) #29CMU SCSGrid files - disadvantages• what to do?• Translation to 2k – d points! (clever, BUT,still has subtle problems) E.g., 1-d ‘regions’A BCx-startx-end01½¼¾01½¼¾ABC15-826 Copyright: C. Faloutsos (2005) #30CMU SCSGrid files - disadvantages• what to do?• Translation to 2k – d points! (clever, BUT,still has subtle problems) E.g., 1-d ‘regions’A BCx-startx-end01½¼¾01½¼¾ABC615-826 Copyright: C. Faloutsos (2005) #31CMU SCSGrid files - disadvantages• what is the problem, then?15-826 Copyright: C. Faloutsos (2005) #32CMU SCSGrid files - disadvantages• what is the problem, then?• A: dimensionality curse; large query regions15-826 Copyright: C. Faloutsos (2005) #33CMU SCSGrid files – conclusions• works OK in low-d un-correlated points• but z-ordering/R-trees seem to work betterfor higher-d• smart idea to translate k-d rectangles into2*k - points (but: dim. curse)15-826 Copyright: C. Faloutsos (2005) #34CMU SCSSAMs - Detailed outline• spatial access methods– problem dfn– z-ordering– R-trees– misc topics• grid files• dimensionality curse; dim. reduction• metric trees• other nn methods• text, ...15-826 Copyright: C. Faloutsos (2005) #35CMU SCSDimensionality ‘curse’• Q: What is the problem in high-d?15-826 Copyright: C. Faloutsos (2005) #36CMU SCSDimensionality ‘curse’• Q: What is the problem in high-d?• A: indices do not seem to help, for manyqueries (eg., k-nn)– in high-d (& uniform distributions), most pointsare equidistant -> k-nn retrieves too many near-neighbors– [Yao & Yao, ’85]: search effort ~ O( N (1-1/d) )715-826 Copyright: C. Faloutsos (2005) #37CMU SCSDimensionality ‘curse’• (counter-intuitive, for db mentality)• Q: What to do, then?15-826 Copyright: C. Faloutsos (2005) #38CMU SCSDimensionality ‘curse’• A1: switch to seq. scanning• A2: dim. reduction• A3: consider the ‘intrinsic’ /fractaldimensionality• A4: find approximate nn15-826 Copyright: C. Faloutsos (2005) #39CMU SCSDimensionality ‘curse’• A1: switch to seq. scanning– X-trees [Kriegel+,


View Full Document

CMU CS 15826 - Spatial Access Methods - IV

Download Spatial Access Methods - IV
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Spatial Access Methods - IV and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Spatial Access Methods - IV 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?