DOC PREVIEW
A Practical Large Scale Demonstration

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hawkeye: A Practical Large Scale Demonstration ofSemantic Web IntegrationZhengxiang Pan Abir Qasem Sudhan KanitkarFabiana Prabhakar Jeff HeflinDepartment of Computer Science and Engineering, Lehigh University19 Memorial Dr. West, Bethlehem, PA 18015, U.S.A.{zhp2, abq2, sgk205, ffp206, heflin}@cse.lehigh.eduAbstract. We discuss our DLDB knowledge base system and evaluate its ca-pability in processing a very large set of real-world Semantic Web data. UsingDLDB, we have constructed the Hawkeye knowledge base, in which we haveloaded more than 166 million facts from a diverse set of real-world data sources.We use this knowledge base to demonstrate realistic integration queries in e-government and academic scenarios. In order to support Hawkeye, we extendedDLDB with additional reasoning capabilities. At present, the Semantic Web con-sists of numerous independent ontologies. We demonstrate that OWL can be usedto integrate these ontologies and thereby integrate the data sources that committo them. In terms of performance, we show that the load time of our system islinear on the number of triples loaded. Furthermore, we show that many complexqueries have response times under one minute, and that simple queries can beanswered in seconds.1 IntroductionThe 2005 index of Swoogle [2] contained 850,000 of Semantic Web documents, in2006 this index had 1.5 million SW documents and at the time of writing this paperit boasts a staggering 2.1 million SW Documents. The Semantic Web is growing andclearly scalability is an important requirement for Semantic Web systems. Furthermore,the Semantic Web is an open and decentralized system where different parties can andwill, in general, adopt different ontologies. Thus, merely using ontologies, does notreduce heterogeneity: it just raises heterogeneity problems to a different level. Withoutsome form of alignment, the data that is described in terms of one ontology will beinaccessible to users that ask questions in terms of another ontology. In this paper wepresent a scalable system (166 million triples) and a knowledge base that has beenintegrated using only OWL axioms (as opposed to special purpose mapping languages).We put forward that, in addition to providing semantics to the data, OWL can alsobe used to establish alignments between these heterogeneous web sources. Using mapontologies, ones that contain OWL axioms that align the concepts of two ontologies, wehave integrated many autonomous data sources and successfully demonstrated usefulqueries. For example, consider a researcher looking for colleagues to collaborate withher in a paper. One heuristic she may apply in her search is to look for people whohave cited a paper that has been cited by her in her other publications. Obviously, thiscan be done using Google, but it will require several intermediate steps to meet herspecific information need. Our system can get her the answer from two different sources(Citeseer and DBLP) in just a few seconds. We discuss this query and others that wehave tested in Section 3.2Before we present our work, we would like to briefly review the state of the publicSemantic Web. We note that there are several traits that we have observed in the existingSemantic Web data (indexed by Swoogle) that influenced our design choice.First, we observe that if we account for minor syntactic errors (e.g. missing a typedeclaration) most of the ontologies in the current Semantic Web have an expressivityequivalent or less than OWL DL. As these syntactic issues can be programmatically re-solved, most of the OWL Full ontologies can be easily converted to OWL DL, which ismost likely what the developer had intended [1]. In a recent survey of ontologies, Wanget al. [13] report similar syntactic errors leading to OWL Full ontologies. Therefore, oursystem’s overall focus is to support OWL DL as opposed to OWL Full.Second, we have observed that the ontologies and data from the social network do-main are currently dominating the Semantic Web landscape. The most frequently usedontology in the Semantic Web is the Friend of A Friend (FOAF) ontology. It is interest-ing to note that although FOAF was originally designed for individuals to make theirprofiles available to public, the prevalence of FOAF data is due to Blog sites and socialnetwork sites (LiveJournal, etc.) which generate FOAF data from users’ public profile.Each site generates its own URI for an individual and therefore we have several differ-ent URIs pointing to the same object. This is essentially an entity resolution problem.In order for us to have a plausible integration of the Semantic Web, we needed to re-solve these duplicate entities, establish alignments and add instance equality reasoningto DLDB system. The owl:InverseFunctionalProperty has helped us in this task. Basi-cally if a property, p, is annotated as InverseFunctionalProperty, then ∀ x, y, z p(y,x) ∧p(z,x) → y = z. With the FOAF data we have used InverseFunctionalProperty to statefor example if two individuals (two distinct URIs) have the same email address thenthey essentially are the same individual.Third, we have observed that it is important to support the TransitiveProperty at-tribute of OWL properties. There are several ontologies in the Semantic Web that de-scribe properties in terms of this characteristic. For example, many ontologies havemade use of transitive properties such as hasPart and subLocationOf. SKOS, the WorldWide Web Consortium’s recent effort in describing a controlled vocabulary for thesauri,classification schemes, subject heading systems and taxonomies within the frameworkof the Semantic Web, makes extensive use of transitive propertiesIn what follows we first describe our enhanced DLDB system. We present its ar-chitecture, design and implementation with a focus on the additional reasoning andoptimizations that we have added to the system based upon the characteristics of theSemantic Web. After presenting the system we then describe our Hawkeye knowledgebase and at the end present related work and conclude. Note: in this paper we build onour initial work [11] in this area and now present a more comprehensive demonstrationon a larger set of Semantic Web data.2 DLDB: A Semantic Web Query Answering SystemThe initial architecture of DLDB is presented in [10]. It is a knowledge base systemthat extends a relational database management system with additional capabilities forpartial OWL reasoning. The DLDB core consists of a Load API and a


A Practical Large Scale Demonstration

Download A Practical Large Scale Demonstration
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Practical Large Scale Demonstration and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Practical Large Scale Demonstration 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?