Document Type Definitions XML and DTDs A DTD Document Type Definition describes the structure of one or more XML documents Specifically a DTD describes Elements Attributes and Entities We will discuss each of these in turn An XML document is well structured if it follows certain simple syntactic rules An XML document is valid if it also specifies and conforms to a DTD Why DTDs XML documents are designed to be processed by computer programs If you can put just any tags in an XML document it s very hard to write a program that knows how to process the tags A DTD specifies what tags may occur when they may occur and what attributes they may or must have A DTD allows the XML document to be verified shown to be legal A DTD that is shared across groups allows the groups to produce consistent XML documents Parsers An XML parser is an API that reads the content of an XML document Currently popular APIs are DOM Document Object Model and SAX Simple API for XML A validating parser is an XML parser that compares the XML document to a DTD and reports any errors Most browsers don t use validating parsers An XML example novel foreword paragraph This is the great American novel paragraph foreword chapter number 1 paragraph It was a dark and stormy night paragraph paragraph Suddenly a shot rang out paragraph chapter novel An XML document contains and the DTD describes Elements such as novel and paragraph consisting of tags and content Attributes such as number 1 consisting of a name and a value Entities not used in this example A DTD example DOCTYPE novel ELEMENT novel foreword chapter ELEMENT foreword paragraph ELEMENT chapter paragraph ELEMENT paragraph PCDATA ATTRIBUTE chapter number CDATA REQUIRED A novel consists of a foreword and one or more chapters in that order Each chapter must have a number attribute A foreword consists of one or more paragraphs A chapter also consists of one or more paragraphs A paragraph consists of parsed character data text that cannot contain any other elements ELEMENT descriptions Suffixes optional one or more zero or more foreword chapter appendix Separators both in order chapter or foreword section chapter Grouping grouping section chapter Elements without children The syntax is ELEMENT name category The name is the element name used in start and end tags The category may be EMPTY In the DTD ELEMENT br EMPTY In the XML br br or just br In the XML an empty element may not have any content between the start tag and the end tag An empty element may and usually does have attributes Elements with unstructured children The syntax is ELEMENT name category The category may be ANY This indicates that any content character data elements even undeclared elements may be used Since the whole point of using a DTD is to define the structure of a document ANY should be avoided wherever possible The category may be PCDATA indicating that only character data may be used In the DTD ELEMENT paragraph PCDATA In the XML paragraph A shot rang out paragraph The parentheses are required Note In PCDATA whitespace is kept exactly as entered Elements may not be used within parsed character data Entities are character data and may be used Elements with children A category may describe one or more children ELEMENT novel foreword chapter Parentheses are required even if there is only one child A space must precede the opening parenthesis Commas between elements mean that all children must appear and must be in the order specified separators means any one child may be used All child elements must themselves be declared Children may have children Parentheses can be used for grouping ELEMENT novel foreword chapter section Elements with mixed content PCDATA describes elements with only character data PCDATA can be used in an or grouping ELEMENT note PCDATA message This is called mixed content Certain rather severe restrictions apply PCDATA must be first The separators must be The group must be starred meaning zero or more Names and namespaces All names of elements attributes and entities in both the DTD and the XML are formed as follows The name must begin with a letter or underscore The name may contain only letters digits dots hyphens underscores and colons and for foreign languages combining characters and extenders The DTD doesn t know about namespaces as far as it knows a colon is just part of a name The following are different and both legal ELEMENT chapter paragraph ELEMENT myBook chapter myBook paragraph Avoid colons in names except to indicate namespaces An expanded DTD example DOCTYPE novel ELEMENT novel foreword chapter biography criticalEssay ELEMENT foreword paragraph ELEMENT chapter section paragraph ELEMENT section paragraph ELEMENT biography paragraph ELEMENT criticalEssay section ELEMENT paragraph PCDATA Attributes and entities In addition to elements a DTD may declare attributes and entities This slide shows examples we will discuss each in detail An attribute describes information that can be put within the start tag of an element In XML dog name Spot age 3 dog In DTD ATTLIST dog name CDATA REQUIRED age CDATA IMPLIED An entity describes text to be substituted In XML copyright In the DTD ENTITY copyright Copyright Dr Dave Attributes The format of an attribute is ATTLIST element name name type requirement name type requirement where the name type requirement may be repeated as many times as desired Note that only spaces separate the parts so careful counting is essential The element name tells which element may have these attributes The name is the name of the attribute Each element has a type such as CDATA character data Each element may be required optional or fixed In the XML attributes may occur in any order Important attribute types There are ten attribute types These are the most important ones CDATA The value is character data man woman child The value is one from this list ID The value is a unique identifier ID values must be legal XML names and must be unique within the document NMTOKEN The value is a legal XML name This is sometimes used to disallow whitespace in the name It also disallows numbers since an XML name cannot begin with a digit Less important attribute types IDREF IDREFS NMTOKENS ENTITY ENTITIES NOTATION xml The ID of another element A list of other IDs A list of valid XML names An entity A list of entities A notation A predefined XML value Requirements Recall that an attribute has the form ATTLIST element name name type requirement The requirement is
View Full Document