DOC PREVIEW
UTD CS 7301 - Managing Large RDF Graphs (Infinite Graph)

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Managing Large RDF GraphsSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Managing Large RDF Graphs(Infinite Graph)Vaibhav KhadilkarDepartment of Computer Science,The University of Texas at DallasFEARLESS engineeringManaging Large RDF Graphs Agenda Motivation behind the project Semantic web technologies overview Proposed architecture Performance metricsFEARLESS engineeringManaging Large RDF Graphs Motivation - Current Problems Jena’s in-memory model does not scale Jena’s RDB and SDB models cannot handle large result sets Hinders ability to do reasoning and large graph processing Current work focuses on load balancing and fault tolerance Current systems can be broken with even 100,000 triples We work on load balancing and polynomial reasoning but memory management breaks systems before any other problems can be addressedFEARLESS engineeringManaging Large RDF Graphs Motivation - Relevance of the problem This is an unsolved problem Critical in handling terabytes of data relevant in today’s times Move the problem from memory space to disk spaceFEARLESS engineeringManaging Large RDF GraphsFEARLESS engineeringJenaIn-memory RDB SDB ARQExtensionReasoningManaging Large RDF Graphs Semantic web technologies overview - Jena Jena is a Java based framework that allows building Semantic web applications Jena provides a programmatic environment for RDF, RDFS, OWL, SPARQL and includes a rule based inference engine Jena allows the creation and manipulation of in-memory or relational database backed (RDB and SDB) RDF graphsFEARLESS engineeringManaging Large RDF Graphs Semantic web technologies overview - Lucene Lucene is a Java based text indexing and searching tool The smallest unit of text that Lucene indexes and searches is a Document A Document contains different fields and a corresponding value for each field The different fields are the indexes that can be used as keywords during a searchFEARLESS engineeringManaging Large RDF Graphs Problems with In-memory Jena Model Ability to handle medium sized graphs As nodes are added memory fills up As more nodes are added, the program crashes with an out of memory exception We want to solve this out of memory problemFEARLESS engineeringManaging Large RDF GraphsFEARLESS engineering5. Continue adding triples3. Buffer sorted based on memory management algorithm4. Write triples based on sorted buffer while triples left > x  of Threshold2. Added triples = Threshold1. Add triplesIn-memory triple store + bufferLucene triple storeBuffer Management StrategyManaging Large RDF GraphsFEARLESS engineering 4. Return result 3. Return result 2. If result not in memory query Lucene triple store1. Query modelIn-memory triple storeLucene triple storeManaging Large RDF Graphs Choice of Algorithm Memory management algorithms such as LRU, MRU, FIFO, and LIFO Social network analysis measures such as degree centrality and individual clustering coefficient Combination of memory management algorithm with degree centrality and individual clustering coefficientFEARLESS engineeringManaging Large RDF GraphsFEARLESS engineeringManaging Large RDF Graphs Choice of buffer and persistence strategy Buffer can be created based on the subject, predicate, object or a combination of them Map Jena’s subject, predicate and object indexes to Lucene indexes directly Create Lucene indexes as needed taking into account the nature of SPARQL queries and Jena’s implementationFEARLESS engineeringManaging Large RDF GraphsFEARLESS engineeringManaging Large RDF Graphs Conclusions from the in-memory model Degree centrality is the best algorithm to choose a node to be persisted to disk Creating Lucene indexes as needed is a better choice for the persistence strategy than creating all indexes at the same timeFEARLESS engineeringManaging Large RDF Graphs Problems with RDB Jena model The RDB Jena model can add any number of triples to the relational database When a query asking for a large number of triples is executed, the result set returned fills up memory causing the program to crash with an out of memory exception We want to solve this out of memory problem We leverage the previous in-memory extension to solve this problemFEARLESS engineeringManaging Large RDF Graphs Memory management algorithm Algorithm We use the LIMIT and OFFSET clauses in SQL to get only a part of the results at a time The retrieved triples are added to the extended in-memory Jena model Thus we use the memory management algorithm from the in-memory model Since the revised in-memory model never runs out of memory this RDB solution never runs out of memoryFEARLESS engineeringManaging Large RDF Graphs Conclusions Conclusions from the extended RDB model Model creation times are similar to the original RDB Jena model Query times vary based on the threshold value in the in-memory solution General conclusions Implemented an in-memory cache based memory management algorithm Solves the memory problem for the in-memory and RDB Jena models by creating an impression of infinite memory for the user Moves the memory problem to disk space FEARLESS engineeringManaging Large RDF Graphs Problems with SDB Jena Model The SDB Jena model can add any number of triples to the relational database When a query asking for a large number of triples is executed, the result set returned fills up memory causing the program to crash with an out of memory exception We want to solve this out of memory problem The SDB solution does not depend on the in-memory or RDB extensions FEARLESS engineeringManaging Large RDF Graphs Memory management algorithm Algorithm We use the LIMIT and OFFSET clauses in SQL to get only a part of the results at a time The retrieved triples are returned as a separate iterator to the executing program FEARLESS engineeringManaging Large RDF Graphs Inferencing in


View Full Document

UTD CS 7301 - Managing Large RDF Graphs (Infinite Graph)

Documents in this Course
Load more
Download Managing Large RDF Graphs (Infinite Graph)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Managing Large RDF Graphs (Infinite Graph) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Managing Large RDF Graphs (Infinite Graph) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?