DOC PREVIEW
An Extensive Math Query Language

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

An Extensive Math Query Language Abdou S. Youssef Department of Computer Science The George Washington University Washington, DC, 20052, USA [email protected] Moody E. Altamimi Department of Computer Science The George Washington University Washington, DC, 20052, USA [email protected] Abstract Math search is a new area of research with many ena-bling technologies but also many challenges. Some of the enabling technologies include XML, XPath, XQuery, and MathML. Some of the challenges involve enabling search systems to recognize mathematical symbols and structures. Several math search projects have made con-siderable progress in meeting those challenges. One of the remaining challenges is the creation and implementa-tion of a math query language that enables the general us-ers to express their information needs intuitively yet pre-cisely. This paper will present such a language and detail its features. The new math query language offers an al-ternative way to describe mathematical expressions that is more consistent and less ambiguous than conventional mathematical notation. In addition, the language goes be-yond the Boolean and proximity query syntax found in standard text search systems. It defines a powerful set of wildcards that are deemed important for math search. These wildcards provide for more precise structural search and multi-levels of abstractions. 1 INTRODUCTION The need to facilitate scientific information exchange between researchers has resulted in the creation of a growing number of specialized mathematical library pro-jects around the world. These projects aim to make scien-tific literature available on the Web. With the increasing online availability of electronic documents that contain mathematical expressions, the ability to find relevant in-formation has become increasingly important. Yet, sup-port for searching for mathematical expressions is only in its infancy. The development of math search capabilities is a new area of research with many technical challenges. Some of the challenges involve enabling search systems to recog-nize mathematical symbols and structures. Several math search projects have made considerable progress in meet-ing those challenges. Several research projects on math search have resolved many of the issues and challenges in math search. Notable among those math-search projects are the math search of the DLMF project at NIST [1], and the math search system of Design Science [2]. The query languages assumed or implemented in those systems follow primarily the same syntax as standard text search. That syntax consists mainly of Boolean query op-erators (i.e., “and”, “or”, and “not”) and phrase operators. Phrase queries are important in math search since math expressions and fragments of expression are meant to be sequences of consecutive terms, that is, phrases. The standard syntax provides for a limited use of wildcards, namely, “?” and “*”. The first stands for one arbitrary character inside a keyword, and the second stands for zero or more arbitrary characters inside a keyword. Such wild-card syntax is severely limiting in math search. For ex-ample, it is not capable of expressing an ellipsis. Also, if a user does not care what certain terms are (such as vari-able names) but cares that two or more of those terms are identical, the standard query syntax is inadequate to ex-press such a need. This paper proposes a new query language that extends the current standard query syntax. The language describes the user’s information needs by allowing the authoring of different types of queries and allowing the use of an ex-panded set of wildcards. The proposed math query lan-guage will enable science and math users to specify their information needs in a more precise way to guarantee that the matches are more relevant to their needs. The implementation of the language maps queries writ-ten in that language into XPath/XQuery queries [3, 4]. It is assumed that the math content is in Content MathML [5]. The justification for this assumption is based on cur-rent technological advances and expected future practices. For example, many conversion tools already exist for converting LaTeX to MathML, such as the Rice Univer-sity tool for conversion to Content MathML [6], and Bruce Miller’s LaTeXML and associated software [7], which convert from LaTeX to a special XML syntax that includes presentation MathML and some content mark up. Furthermore, as the math authoring community becomes more comfortable with MathML and, more importantly, becomes more convinced of the need for and benefits of Content MathML, more conversion tools and authoring tools that yield Content MathML will become available and more dominantly used.2 BACKGROUND AND RELATED WORK This section surveys work related to equation-based math search systems and the user query languages they offer. Mainly query languages developed for the DLMF project, Design Science search system, and Mathematica, will be described. 2.1 DLMF and Mathdex Youssef et al. [8] developed the first generation of an equation-based math search system as part of the Digital Library of Mathematical Functions [1] (DLMF) project at NIST. The DLMF project provides an online source of mathematical content such as formulas and graphs, and allows for the search and retrieval of that content [8]. The mathematical content of DLMF, originally in LaTeX, is converted to html and xhtml using the LaTeXML markup language and software tool developed at NIST. Youssef, who is developing the search system for DLMF, opted for an evolutionary approach, building on the existing text search technology. As a result, the query language syntax is almost identical to text search syntax, with the added power of recognizing mathematical symbols and struc-tures to a great extent. Mathdex [2] is a web-based search engine developed by Design Science [9] as part of an NSF grant to facilitate equation-based search. Mathdex indexes not only LaTeX but also Presentation MathML, and it crawls the Web looking for Math contents and indexing them. Like the DLMF search, Mathdex follows an evolutionary approach by utilizing text search technology. Even though text search technology has reached a high level of maturity, it cannot fully capture all of the charac-teristics inherent in mathematical content. As a result, the query language developed for the DLMF project has lim-ited


An Extensive Math Query Language

Download An Extensive Math Query Language
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An Extensive Math Query Language and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Extensive Math Query Language 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?