DOC PREVIEW
CMU CS 15826 - Fractals - case studies - I

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

C. FaloutsosCMU SCS15-826: Multimedia Databasesand Data MiningFractals - case studies - IC. Faloutsos15-826 Copyright: C. Faloutsos (2005) 2CMU SCSOutlineGoal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining15-826 Copyright: C. Faloutsos (2005) 3CMU SCSIndexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods– z-ordering– R-trees– misc• fractals– intro– applications• text• ...15-826 Copyright: C. Faloutsos (2005) 4CMU SCSIndexing - Detailed outline• fractals– intro– applications• disk accesses for R-trees (range queries)• dimensionality reduction• selectivity in M-trees• dim. curse revisited• “fat fractals”• quad-tree analysis [Gaede+]15-826 Copyright: C. Faloutsos (2005) 5CMU SCS(Fractals mentioned before:)• for performance analysis of R-trees• fractals for dim. reduction15-826 Copyright: C. Faloutsos (2005) 6CMU SCSCase study#1: R-tree performanceProblem• Given– N points in E-dim space• Estimate # disk accesses for a range query(q1 x ... x qE)(assume: ‘good’ R-tree, with tight, cube-like MBRs)C. Faloutsos15-826 Copyright: C. Faloutsos (2005) 7CMU SCSCase study#1: R-tree performanceProblem• Given– N points in E-dim space– with fractal dimension D• Estimate # disk accesses for a range query(q1 x ... x qE)(assume: ‘good’ R-tree, with tight, cube-like MBRs)Typically, in DB Q-opt: uniformity + independence15-826 Copyright: C. Faloutsos (2005) 8CMU SCSExamples:World’s countries• BUT: area vs population for ~200 countries (1991 CIA fact-book). areapoplog(area)log(pop)15-826 Copyright: C. Faloutsos (2005) 9CMU SCSExamples:World’s countries• neither uniform, nor independent!areapoplog(area)log(pop)15-826 Copyright: C. Faloutsos (2005) 10CMU SCSExamples: TIGER files• neither uniform, nor independent!MG county LB county15-826 Copyright: C. Faloutsos (2005) 11CMU SCSHow to proceed?• recall the [Pagel+] formula, for range queries of size q1 x q2#DiskAccesses(q1,q2) =sum ( xi,1+ q1) * (xi,2+ q2)But: formula needs to know the xi,jsizes of MBRs!15-826 Copyright: C. Faloutsos (2005) 12CMU SCSR-trees - performance analysisI.e: for range queries - how many disk accesses, if we just now that we have- N points in E-d space?A: can not tell! need to know distributionC. Faloutsos15-826 Copyright: C. Faloutsos (2005) 13CMU SCSR-trees - performance analysisQ: OK - so we are told that the Hausdorff fractal dim. = D0 - Next step?(also know that there are at most C points per page)D0=1D0=215-826 Copyright: C. Faloutsos (2005) 14CMU SCSR-trees - performance analysisHint: dfn of Hausdorff f.d.:15-826 Copyright: C. Faloutsos (2005) 15CMU SCSReminder:Hausdorff or box-counting fd:• Box counting plot: Log( N ( r ) ) vs Log ( r)• r: grid side• N (r ): count of non-empty cells• (Hausdorff) fractal dimension D0:)log())(log(0rrND∂∂−=15-826 Copyright: C. Faloutsos (2005) 16CMU SCSReminder• Hausdorff fd:rlog(r)log(#non-empty cells)D015-826 Copyright: C. Faloutsos (2005) 17CMU SCSReminder• dfn of Hausdorff fd implies thatN(r) ~ r(-D0)# non-empty cells of side r15-826 Copyright: C. Faloutsos (2005) 18CMU SCSR-trees - performance analysisQ (rephrased): what is the side s1, s2, ... of parent nodes, given N data points, packed by C, with f.d. = D0D0=1D0=2C. Faloutsos15-826 Copyright: C. Faloutsos (2005) 19CMU SCSR-trees - performance analysisQ (rephrased): what is the side s1, s2, ... of parent nodes, given N data points, packed by C, with f.d. = D0D0=1D0=2s1s215-826 Copyright: C. Faloutsos (2005) 20CMU SCSR-trees - performance analysisQ (rephrased): what is the side s1, s2, ... of parent nodes, given N data points, packed by C, with f.d. = D0D0=1D0=2s1=s2=s15-826 Copyright: C. Faloutsos (2005) 21CMU SCSR-trees - performance analysisA: (educated guess)• s=s1=s2 (= ... ) - square-like MBRs• N/C non-empty cells = K * s(-D0)D0=1D0=2s1s2log(s)log(#cells)15-826 Copyright: C. Faloutsos (2005) 22CMU SCSR-trees - performance analysisDetails of derivations: in [PODS 94].Finally, expected side s of parent MBRs:s = (C/N)1/D0Q: sanity check: how does s change with D0?A:15-826 Copyright: C. Faloutsos (2005) 23CMU SCSR-trees - performance analysisDetails of derivations: in [Kamel+, PODS 94].Finally, expected side s of parent MBRs:s = (C/N)1/D0Q: sanity check: how does s change with D0?A: s grows with D0Q: does it make sense?Q: does it suffer from (intrinsic) dim. curse?15-826 Copyright: C. Faloutsos (2005) 24CMU SCSR-trees - performance analysisQ: Final-final formula (# disk accesses for range queries q1 x q2 x ... ):A:C. Faloutsos15-826 Copyright: C. Faloutsos (2005) 25CMU SCSR-trees - performance analysisQ: Final-final formula (# disk accesses for range queries q1 x q2 x ... ):A: # of parent-node accesses:N/C * (s + q1) * (s + q2 ) * ... ( s + qE)A: # of grand-parent node accesses15-826 Copyright: C. Faloutsos (2005) 26CMU SCSR-trees - performance analysisQ: Final-final formula (# disk accesses for range queries q1 x q2 x ... ):A: # of parent-node accesses:N/C * (s + q1) * (s + q2 ) * ... ( s + qE)A: # of grand-parent node accessesN/(C^2) * (s’ + q1) * (s’ + q2 ) * ... ( s’ + qE)s’ = (C^2/N)1/D015-826 Copyright: C. Faloutsos (2005) 27CMU SCSR-trees - performance analysisResults:IUE (x-y star coordinates)# leaf accessesquery side15-826 Copyright: C. Faloutsos (2005) 28CMU SCSR-trees - performance analysisResults:LB County# leaf accessesquery side15-826 Copyright: C. Faloutsos (2005) 29CMU SCSR-trees - performance analysisResults:MG-county# leaf accessesquery side15-826 Copyright: C. Faloutsos (2005) 30CMU SCSR-trees - performance analysisResults:2D- uniform# leaf accessesquery sideC. Faloutsos15-826 Copyright: C. Faloutsos (2005) 31CMU SCSR-trees - performance analysisConclusions: usually, <5% relative error, for range queries15-826 Copyright: C. Faloutsos (2005) 32CMU SCSIndexing - Detailed outline• fractals– intro– applications• disk accesses for R-trees (range queries)• dimensionality reduction• selectivity in M-trees• dim. curse revisited• “fat fractals”• quad-tree analysis [Gaede+]• ....15-826 Copyright: C. Faloutsos (2005) 33CMU SCSCase study #2: Dim. reductionProblem definition: ‘Feature selection’• given N points, with E dimensions• keep the k most ‘informative’ dimensions[Traina+,SBBD’00]CaetanoTrainaAgmaTrainaLeejayWu15-826 Copyright: C. Faloutsos (2005) 34CMU SCSDim.


View Full Document

CMU CS 15826 - Fractals - case studies - I

Download Fractals - case studies - I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fractals - case studies - I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fractals - case studies - I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?