Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

What Goes Around Comes AroundByMichael StonebrakerJoey HellersteinAbstractThis paper provides a summary of 35 years of data model proposals, grouped into 9different eras. We discuss the proposals of each era, and show that there are only a fewbasic data modeling ideas, and most have been around a long time. Later proposalsinevitably bear a strong resemblance to certain earlier proposals. Hence, it is aworthwhile exercise to study previous proposals.In addition, we present the lessons learned from the exploration of the proposals in eachera. Most current researchers were not around for many of the previous eras, and havelimited (if any) understanding of what was previously learned. There is an old adage thathe who does not understand history is condemned to repeat it. By presenting “ancienthistory”, we hope to allow future researchers to avoid replaying history.Unfortunately, the main proposal in the current XML era bears a striking resemblance tothe CODASYL proposal from the early 1970’s, which failed because of its complexity.Hence, the current era is replaying history, and “what goes around comes around”.Hopefully the next era will be smarter.I IntroductionData model proposals have been around since the late 1960’s, when the first author“came on the scene”. Proposals have continued with surprising regularity for theintervening 35 years. Moreover, many of the current day proposals have come fromresearchers too young to have learned from the discussion of earlier ones. Hence, thepurpose of this paper is to summarize 35 years worth of “progress” and point out whatshould be learned from this lengthy exercise.We present data model proposals in nine historical epochs:Hierarchical (IMS): late 1960’s and 1970’sDirected graph (CODASYL): 1970’sRelational: 1970’s and early 1980’sEntity-Relationship: 1970’sExtended Relational: 1980’sSemantic: late 1970’s and 1980’sObject-oriented: late 1980’s and early 1990’sObject-relational: late 1980’s and early 1990’sSemi-structured (XML): late 1990’s to the presentIn each case, we discuss the data model and associated query language, using a neutralnotation. Hence, we will spare the reader the idiosyncratic details of the variousproposals. We will also attempt to use a uniform collection of terms, again in an attemptto limit the confusion that might otherwise occur.Throughout much of the paper, we will use the standard example of suppliers and parts,from [CODD70], which we write for now in relational form in Figure 1.Supplier (sno, sname, scity, sstate)Part (pno, pname, psize, pcolor)Supply (sno, pno, qty, price)A Relational SchemaFigure 1Here we have Supplier information, Part information and the Supply relationship toindicate the terms under which a supplier can supply a part.Figure 2 shows a few instances of sample data.Suppplier Part16 General Supply Boston Ma 27 Power saw 7 silver24 Special Supply Detroit Mi 42 bolts 12 graySupply16 27 100 $20.0016 42 1000 $.1024 42 5000 $.08Some Sample DataFigure 2II IMS EraIMS was released around 1968, and initially had a hierarchical data model. It understoodthe notion of a record type, which is a collection of named fields with their associateddata types. Each instance of a record type is forced to obey the data descriptionindicated in the definition of the record type. Furthermore, some subset of the namedfields must uniquely specify a record instance, i.e. they are required to be a key. Lastly,the record types must be arranged in a tree, such that each record type (other than theroot) has a unique parent record type. An IMS data base is a collection of instances ofrecord types, such that each instance, other than root instances, has a single parent of thecorrect record type.This requirement of tree-structured data presents a challenge for our sample data, becausewe are forced to structure it in one of the two ways indicated in Figure 3. For the first ofthe two schemas, we also indicate our sample data in Figure 4.Two Hierarchical OrganizationsFigure 3Some Example DataFigure 4Supplier (sno,sname, scity,sstate)Part (pno, pname,psize, pcolor, qty,price)Part (pno,pname, psize,pcolor)Supplier (sno,sname, scity,sstate, qty, price)16General SupplyBoston, Ma24Special SupplyDetroit, Mi27, Power saw7, silver,100, $20.0042, Bolts12, gray1000, $.1042, Bolts12, gray5000, $.08These representations share two common undesirable properties:1) Information is repeated. In the first schema, Part information is repeated foreach Supplier who supplies the part. In the second schema, Supplier informationis repeated for each part he supplies. Repeated information is undesirable,because it offers the possibility for inconsistent data. For example, a repeateddata element could be changed in some, but not all, of the places it appears,leading to an inconsistent data base.2) Existence depends on parents. In the first schema it is impossible for there to bea part that is not currently supplied by anybody. In the second schema, it isimpossible to have a supplier which does not currently supply anything. There isno support for these “corner cases” in a strict hierarchy.IMS chose a hierarchical data base because it facilitates a simple data manipulationlanguage, DL/1. Every record in an IMS data base has a hierarchical sequence key(HSK). Basically, an HSK is derived by concatenating the keys of ancestor records, andthen adding the key of the current record. HSK defines a natural order of all records inan IMS data base, basically depth-first, left-to-right. DL/1 intimately used HSK order forthe semantics of commands. For example, the “get next” command returns the nextrecord in HSK order. Another use of HSK order is the “get next within parent”command, which explores the subtree underneath a given record in HSK order.Using the first schema, one can find all the red parts supplied by Supplier 16 as:Get unique Supplier (sno = 16)Until no-more {Get next within parent (color = red)}The first command finds Supplier 16. Then we iterate through the subtree underneaththis record in HSK order, looking for red parts. When the subtree is exhausted, an erroris returned.Notice that DL/1 is a “record-at-a-time” language, whereby the programmer constructs analgorithm for solving his query, and then IMS executes this algorithm. Often there aremultiple ways to solve a query. Here is another way to solve the


View Full Document
Download What Goes Around Comes Around by Stonebraker, Hellerstein
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view What Goes Around Comes Around by Stonebraker, Hellerstein and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view What Goes Around Comes Around by Stonebraker, Hellerstein 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?