U of M CSCI 8715 - Comparing path-based and vertically-partitioned RDF databases - D2966896

Home> Schools> University of Minnesota- Twin Cities> Computer Science (CSCI) > CSCI 8715> Comparing path-based and vertically-partitioned RDF databases

DOC PREVIEW

U of M CSCI 8715 - Comparing path-based and vertically-partitioned RDF databases

School name University of Minnesota- Twin Cities

Course Csci 8715- Spatial Databases and Applications

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Comparing path-based and vertically-partitioned RDF databasesPreetha Lakshmi & Chris Mueller12/10/2007CSCI 8715Shashi ShekharOutlineMotivationBackground and related workProblem statementOur contributionsAssumptionsExperimental processResultsConclusionsMotivationSemantic Weblibrariesscientific databasesindustrysocial networksComputer-to-computer communicationRDF SchemaSchemaInstanceRDF SchemaRDF Triples<subject, property, object><“www.picasso.net” , first, “Pablo”>Related WorkTriple storeProperty tablesClass property tablesDynamic table model Vertically partitioned tables (Abadi, et al 2007)Path based approach (Matono, et al 2005)Require more self joins, normal joins, NULL value storageVertical PartitioningA table is created for each propertyFirst Subject Object 'r1' 'Picasso''r4' 'August'Last Subject Object 'r1' 'Picasso''r4' 'Rodin'Paints Subject Object 'r1' 'r2''r1' 'r3'... etc.Path-based ModelPath signatures relate to instance dataPath pathid pathexp 1 ''2 '#first'3 '#last'4 '#paints'5 '#title<#paints'6 '#sculpts'7 '#title<#sculpts'Resource name pathid root 'r1' 1 'r1''r2' 4 'r1''r3' 4 'r1''r4' 1 'r4''Picasso' 2 'r1''Pablo' 3 'r1''August' 2 'r4''Rodin' 3 'r4'...Our enhancementProblem StatementGiven: A set of RDF triplesVertical partitioning storage modelPath-based storage model Find: Query plans for the various categories of queries under these two storage schemes. Ob jective: To determine query types that perform comparatively better or worse in two storage models Why is this challenging?Need for efficient storage of structured dataDifferent application domains use RDF, generic storage schemes should support a diverse workload.ContributionsIdentification of benchmark queriesschema, instance, path, and aggregate queries Enhancement to the path-based schema that addresses different types of workloads Comparison of path-based model and vertical partitioning Analysis of cyclic queriesQuery TypesSchema queriesfind all types of artistslist all property nameslist nodes with 2 or more descendants.find the transitive sub-classes of a class 'sculpture'list properties with 2 or more descendantsInstance queriesfind the titles of all paintings by Picassoselect all nodes within one edge-length of R4list all the properties of node r4Schema vs InstancePathNon-pathAggregateCycleRelationshipDiameter Constraintsintermediate nodeterminal nodeConnectionListQuery TypesPath queriesfind the title of any painting painted by anyonedisplay all the titles of work done by artistsfind the names of all the sculptors...with constraint on intermediate nodefind an artist's name where the artifact is a painting...with terminal node constraintsdisplay all the titles of work done by PicassoQuery TypesPath queriesconnection querieslist all the properties of node r4is there a connection between 'Picasso' and 'Guernica'?diameter queriesselect all nodes in the graph within one edge-length of R4non-simple path queriesdetect loops in the dataset starting at 'Picasso'detect loops in the whole datasetQuery TypesAggregate queriesfind all nodes with 2 or more propertieslist all subjects that have two instances of a single propertyRelationship queriesfind any relationship between r1 and r4AssumptionsUsing a small dataset, with the assumption that number of joins and efficiency of the queries will not change significantly with larger datasetsNo explicit storage of the RDF schema in the vertically-partitioned scheme (application independent)INSERT, UPDATE, & DELETE are insignificant compared to SELECTKey nodes in the path-based model are well-definedIn practice, key nodes, would be generated dynamically after user load analysisExperimental ProcessValidation parametersNodesEdgesNumber of joinsNumber of tablesCPU costStorage bytesSetup both schemes in Oracle 10g for the RDF graph shown earlierMaterialized path lengths in path-based schemeGenerated query plansAnalyzed queries based on the validation parametersCycle queries – joins are not supportedDataset used for experiment* For CPU cost and bytes (storage) the entry in the table indicates which scheme used less CPU cycles or occupied less space. In cases where both required an identical or similar amount of computation or storage, we indicate this with “same”.Queries which cannot be answered are indicated by ‘--‘.Experimental ResultsConclusions & ObservationsVertical Partitioning performs well for Short path length, terminal node constraints.Offers storage benefits for instance queries without path expressions.Enhanced Path Based model performs well forSchema queries, path queries, cycle queriesQueries which the original path-based could not address and the enhanced model could answer:Connection queries and diameter queriesPath queries with intermediate node constraintsConclusion (Cont'd)Both the schemes show the same performance on instance queries without path expressions.Both the schemes do not address relationship queriesInteresting results for cycle queriesspecifying the start node gives a bad performance than when the start node is not specifiedspecifying the start node uses Oracle Filter.Future WorkTest large and diverse datasetsTest vertical partitioning with a column-orientated database like MonetDBPruning strategies for cycle queriesImpose join indexesFind approaches to answer relationship queriesStorage classification based on the application domainThank YouQuestions?Please see http://www.cs.umn.edu/~cmueller/cs8715 for a copy of the report that accompanies this presentation, including a full

View Full Document