Unformatted text preview:

Literature ReviewIntroductionRDF Storage SchemesReferencesLiterature ReviewG1: Preetha Lakshmi, Chris MuellerCS 8715October 1, 2007IntroductionThe Semantic Web is an initiative that attempts to describe information on the web in unambiguous, machine-comprehensible formats. The web in its current incarnation provides information in human-readable formats, but the meaning of this information and its relation to other pieces of information elsewhere on the web are not well-defined. Semantic Web data uses common schemas to describe data from disparate sources. Machines capable of reading this data could comprehend the data, for example inferences could be made about the data based on information from other datasets (Berners-Lee, 2001).Semantic Web information is often stored in RDF in the form of triples (subject, property, object). A combination of many RDF triples forms an RDF graph. A basic example is a set of RDF triples that describe the author of a book:person1 isNamed “Mark Twain”book1 isTitled “The Adventures of Tom Sawyer”book1 hasAuthor person1Posing queries to the Semantic Web is not trivial. RDF stores from different sources must be aggregated and cached locally or on a search provider. A number of storage implementations and schemes have been proposed that use databases to cache RDF triples. Some implementations maintain RDF-specific information in the application layer, and some store the RDF schema at the database level. When stored at the application level, the application is database-independent, but there are trade-offs in terms of performance and scalability. When the RDF schema is implemented at the database level, RDF structure can be exploited to obtain efficiency using existing database models. We will discuss the current state of the art of RDF database storage schemes. The simplest way to store RDF data is in a triple store, essentially one large table with three columns for subject, predicate, and object. Variations on the triple-store have shown improvements in efficiency and reduced the number of self-joins needed when issuing complex queries. Recently, a proposal for vertically-partitioning RDF triples into property tables, in which unique tables are created for each property in an RDF graph, has shown dramatic performance improvements over the triple-store.RDF Storage SchemesA normalized triple store attempts to improve the efficiency of the triple store. A Statements table, which stores RDF triples in three columns, as well as a Literals table and a Resources table make up the basic table schema. In RDF, literals refer to literal values, such as strings or integers, and Resources refer to URIs. The Statements table contains references to items in the Literals and Resources table, reducing disk space usage. A variation on this approach, the denormalized triple store attempts to limit the number of joins that would occur across the Statement, Resources, and Literals tables. Instead of always storing a reference to the Literals and Resource tables, the Statements table will hold the literal or resource within itself so long as that resource or literal is smaller than a certain limit, e.g. < 255 characters. Jena1 made use of normalized triple stores, and Jena2 makes use of denormalized triple stores. (Wilkinson 2003) Oracle also uses normalized tables (Alexander n.d.).Another method for storing RDF data is to recreate the RDF schema in a dynamic table schema. With this approach, classes and properties in RDF are mirrored in tables. In the book/author example given above, separate tables would be created for books, authors, and titles. Relationships among the tables will express the RDF triples. One benefit of using this approach is that queries can be made against the RDF schema itself (Matono 2005). The dynamic table schema is used by Sesame. Research suggests this method can perform with reasonable efficiency, especially when properties of the underlying DBMS are exploited effectively, e.g. using the object-relational features of Postgres to support native subclassing (Broekstra 2002). There are several problems with the dynamic table schema. It can be inefficient, especially when many self-joins are required during query execution. RDF data cannot be stored without knowing the RDF schema needed to create the table schema. Also, if the RDF schema changes, the table schema must be recreated. (Beckett 2003, Matono 2003)In order to reduce the number of self-joins needed for queries, some applications have implemented property tables. In a property table, subjects along with similar properties are stored in denormalized tables like flat structures (Abadi 2007). All the data are stored in the same table and hence eliminates the need for joins. There can be significant overhead when using property tables if certain subjects do not have certain properties applicable to them, as the number of NULL values in a table will increase. This wastes the storage space. Postgres can prove to be better in case of NULL values because it just has a bit representation for NULL values. Another drawback to using property tables is that if queries require joins from many different property tables, or if the property tables were poorly configured, performance can actually be worse than a traditional triple-store. Jena2 implements property tables (Wilkinson 2003).The property class table is a variation on the property table. Instead of storing all the values a single table, triples are split up based on the availability of data so as to avoid storing of NULL values. This saves space. The selection of column names must be made very carefully. This makes implementation more dependent on the kind of data being stored and hence might require changes based on the inputdata. But a drawback here is, again, that if a query refers to properties from more than one table a join and a union might be required, performance can be degraded. Property class tables are implemented in Jena2 (Wilkinson 2003).Oracle stores RDF triples in graphs. Each triple is represented as directed or undirected graphs, as a part of Oracle Spatial network data model (Wang 2003). Each triple is stored as a separate object. There are three main components to the table schema. RDF_VALUES are the values of all parts of the triples. RDF_NODES stores subjects and objects. RDF_LINKS stores all the link information and properties as well as reification information. Oracle supports bags, which are unordered groups


View Full Document

U of M CSCI 8715 - Literature Review

Documents in this Course
Load more
Download Literature Review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Literature Review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Literature Review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?