Unformatted text preview:

Introduction to XML SchemaExtensible Markup Language (XML) is now widely used for interchanging documents and data. Much ofthis is sent via the Internet, but a large amount is also exchanged using private networks of some sort. There are now many XML subsets designed for particular areas, ranging from the rapid distribution of breaking news (using RSS, Rich Site Summary or Really Simple Syndication) to markup languages tailored specifically for internal company use.Since XML does not have tags with fixed meanings, the senders and receivers must agree on the tags and know how to interpret them. For example, a list tag could refer either to an html form or a medical form. One way to handle this is to put each into separate namespaces. Along with this, both Document Type Definitions (DTDs) and XML Schema are used to certify that the document adheres to certain agreed upon standards.NamespacesDistinctions are made between tags with the same names by adding a prefix to the beginning of the tag. For example, we could have h:form for html forms and m:form for medical forms. The entire name is said to be the qualified name. It consists of the prefix and the local part. Since XML tag names may contain only a single colon, the local part must be colon free.Namespaces are described by a Uniform Resource Identifier (URI). The identifier doesn’t actually have to point to a real web page, but it is preferable that it do so. The page only needs to have some explanation about the uses for the prefix. The main one that we will use isxmlns:xs="http://www.w3.org/2001/XMLSchema"This is the namespace for XML schema. The prefix is "xmlns", which stands for xml namespace. For the form example above, we could have namespacesxmlns:m="http://csis.pace.edu/~wolf/medical/"andxmlns:h="http://www.w3.org/TR/html4/"The latter is a real website, the W3C HTML 4.01 Specification. However there is no medical folder on my web site. If you try to link to it, you will get a Not Found page.If you put a namespace attribute in a tag, all its children will inherit it. This way you do not have to add the prefix to every tag. This provides a default namespace for the tag and its children.<form xmlns="http://csis.pace.edu/~wolf/medical/"><doctor>Dr. Stein</doctor><patient>Alice Lee</patient></form>Schema and DTDsSchema and Document Type Definitions (DTDs) are used to make sure that those that use the documents agree on their contents and form. DTDs were developed first. They define the tree structure of a document, but they only provide two data types, CDATA and PCDATA. This was fine when XML was primarily used for marking up documents, such as books and articles. Most of that content consists of character data.However, now XML is widely used to interchange data from files and databases. This data can have a number of other data types, including integers, decimals, dates and booleans. Also the W3C Recommendation for DTDs came before that for namespaces. A colon is allowed in an XML name, so a qualified name will be accepted by a DTD, but a DTD cannot parse the separate parts. Schema solve bothof these problems. In addition, Schema are themselves XML documents. So a new format does not have to be learned.The W3C Recommendation for Schema only dates to May 2001. However, they are now probably more widely used than DTDs. And some are suggesting that DTDs be retired in favor of Schema. Schema are more complicated than DTDs, but they also do more. We will look at some simple examples and then a few more complicated ones.First Address ExampleWe considered the following XML document earlier.<?xml version = "1.0" ?><address><name>Alice Lee</name><email>[email protected]</email><phone>123-45-6789</phone><birthday>1983-07-15</birthday></address>It might represent a row in a database. We had a DTD that described it. The following Schema does also.Note the first two lines of the schema. They are standard and must be copied exactly into the document.<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="address"><xs:complexType><xs:sequence><xs:element name="name" type="xs:string"/><xs:element name="email" type="xs:string"/><xs:element name="phone" type="xs:string"/><xs:element name="birthday" type="xs:date"/></xs:sequence></xs:complexType></xs:element></xs:schema>This looks more complicated than the DTD we used for it previously. But it also contains more information. It says that address is an element, that its type is complex, and that the elements called name, email, phone, and birthday must occur in the order shown. If <xs:sequence> had been left out, the four elements could appear in any order, but they would all have to be there. Also while three of the elements are strings, the fourth is a date. Date fields in XML are of the form yyyy-mm-dd. If they are notin this form, they are not valid.Also since schema are XML documents themselves, they mirror the form of the documents they are describing. The one above shows that address is the root node and that name, email, phone, and birthdayare its children. This schema also says that each element must occur once and only once. The default is exactly once. This can be changed by adding a constraint to an element.<xs:element name="phone" type="xs:string" maxOccurs="unbounded"/>This says that there may be one or more phone numbers listed. There must be at least one, however. To change that, we would have to add another constraint, minOccurs="0".An XML document is known as an instance of the schema. To use the schema, the document must contain a link to it. This is put into the root tag.<addressxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation="address.xsd">Notice that it not only identifies the W3C site for schema, but it also indicates that this is an instance of that schema. Since namespaces are not used inside this document, it says that the location of the schema is in xsi:noNamespaceSchemaLocation. If a namespace had been used, this would change to xsi:schemaLocation.Schemas are not unique. Another one for address.xml uses a reference in one element to another element.Thus the address element has references to the name, email, phone, and birthday elements. This can be used to divide the schema into manageable parts. Note that comments follow the usual rules for html and xml.<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema


View Full Document
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?