No Schema SQL NS SQL Querying Relational Databases Independent of Schema Adam Boot Maryam Karimzadehgan Keith Kelly Michael Maur University of Illinois at Urbana Champaign CS511 Advanced Database Management Systems Jing Yu Abstract SQL is a declarative language that is dependent on the underlying schema of a database This means that a causal user must know the underlying schema in order to retrieve relevant data from a database However dependency on the schema is undesirable for several reasons It places a burden on the user to understand the schema and it requires that the input queries must be updated whenever the schema changes In this paper we provide a more relaxed form of formal SQL syntax No Schema SQL NS SQL NS SQL frees user from knowing the underlying schema but retains the expressive power of a structured query language The user need only know the basic syntax of SQL Since we are relaxing formal SQL away from schema specific content certain kinds of ambiguity become possible specifically column name and join path ambiguity Providing an intuitive interface allows this new system to be effectively used by the casual user without knowledge of the target database s underlying schema 1 Introduction Motivation One of Codd s main stated goals of proposing the Relational Model as a new paradigm was to decouple data access from physical representation This allows a declarative language like SQL to be independent of ordering indexing or access path dependencies Unfortunately SQL is still dependent on the underlying database schema SQL currently places the burden of navigating the database schema on the user and many casual database users find SQL onerous for this reason alone There is a need for a less restrictive and more intuitive syntax for specifying a query Specifically this relaxed language should involve two things Automatically resolve column and table names The entire FROM block may be omitted Further oftentimes column names are unique in a database so explicitly specifying their table is unnecessary If the database uses intention revealing column names the user need never be aware of which columns belong to which tables they need only know what type of data the database holds Automatically infer foreign to primary key mappings The application should be able to deduce when two tables involved in a query should adhere to a key mapping without the user explicitly saying so If a relationship exists between two tables then this is nearly always what the user intends The proposed query language looks very much like SQL but its lack of formal rigor makes its use more pleasant to the casual user The challenge of our project is to discover a target database s schema for use in query inferences and to disambiguate and translate between the NS SQL input query and a formal SQL statement Because the proposed query language allows implied table names and join paths it may be impossible for a given query to be entirely disambiguated In such a case the application should identify the problem and suggest valid alternatives to the user Example A brief note on our color convention throughout this paper we use font color and typeface in order to help highlight the purpose for or relationship among various entities Of particularly importance is the use of the color blue to indicate NS SQL input or referenced column names and the color green to indicate formal SQL output or physical columns A referenced column is simply a partially qualified column name which is referenced somewhere in an input NS SQL query A physical column is an entity representing a fully qualified column extracted from the database schema One of the main challenges of our system is in determining the proper mapping between referenced columns and physical columns 1 No Schema SQL Consider a database with the tables Employee Job and Facility Assume the user is looking for the name city and state of all employees whose salary is greater than 70 000 and whose job title is Database Designer In order to specify this query using formal SQL syntax the following statement would be required Employee id name salary job id facility id Job id title description Facility id city state SELECT Empoyee name Facility city Facility state FROM Employee JOIN Job ON Employee job id Job id JOIN Facility ON Employee facility id Facility id WHERE Employee salary 70000 AND Job title Database Designer Much of this statement is concerned with the proper reassembly of the relational database s normalized tables We believe the syntax could be much more intuitive and pleasant to use Consider the same query using NS SQL SELECT name city state WHERE salary 70000 AND title Database Designer Related Work Related research has focused largely on the removal of structure from the input query either in the form of a natural language query language or a keyword search NS SQL shares the motivation of a desire to provide a more intuitive interface to the structured data in a relational database However NS SQL differs in that it does not attempt to achieve this by removing the structure of the query language itself but rather it removes the necessity of knowledge about the structure of the underlying database This is useful in that it retains the expressive powers of a structured query language without the burden associated with understanding the database s schema For a more thorough survey of related research papers see Appendix A Survey of Related Work Organization This paper is organized as follows In Section 2 System Architecture we provide a high level overview of our project In Section 3 Subsystems we describe the project subsystems in detail In Section 4 Future Work we discuss direction for expansion of NS SQL and our supporting application In Section 5 Summary we analyze the contribution of our project and conclude the paper 2 System Architecture Our project is an ambitious one as it involved the invention and vetting of a modified query language along with all of the implementation involved in constructing a reference system The system architecture is depicted in Figure 1 url schema Connection sql DBMS via JDBC console GUI Hub schema Schemafier ambiguity error disambig nssql command line script file GUI ddl ambig NS SQL Statement SQL parse error Parser Figure 1 2 No Schema SQL Our choice of programming language was Java and we decomposed the project into three major subsections which each addresses a logically separate concern of running the source to
View Full Document