DOC PREVIEW
Princeton COS 333 - XML and friends

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

XML and friends• history/background–GML (1969)–SGML (1986)–HTML (1992)– World Wide Web Consortium (W3C) (1994)• XML (1998)–core language– vocabularies, namespaces: XHTML, RSS, Atom, SVG, MathML, Schema, …– validation: Schema, DTD– parsers: SAX, DOM– processing XML documents: XPath, XSLT, XQuery– web services based on XML: SOAP, WSDL, UDDI, …• alternatives (subset of a huge number)– JSON, YAML, HDF5, ASN.1, ...• sources (subset of a huge number)– www.w3.org (official)– www.xml.com (O'Reilly)Markup languages• "mark up" documents with human-readable tags– content is separate from description of content– not limited to describing visual appearance• SGML and XML are meta-languages for markup – languages for describing grammar and vocabularies of other languages– element: data surrounded by markup that describes it<person>George Washington</person>– attribute: named value within an element<body bgcolor="green">– extensible: tags & attributes can be defined as necessary– strict rules of syntaxwhere tags appear, what names are legal,what attributes are associated with elements– instances are specialized to particular applicationsHTML: tags for document presentationXHTML: HTML with precise syntax rules• XML is compatible with SGML– a simplified, inter-operable formXML: eXtensible Markup Language• an extensible way to describe any kind of data• a notation for describing trees (only)• each internal node in the tree is an element• leaf nodes are either attributes or text• "well formed": the instance is a tree– everything balanced, terminated, quoted, etc.• "valid": satisfies syntactic rules given in a DTD or schema– valid tags & attribs, proper order, right number, …• human-readable text only (Unicode), not binary– can process with standard tools– independent of proprietary tools and representations• not a programming language– XML doesn't do anything, just describes– programs read, process, and write it• not a database – programs convert between XML and databasesXML in use• two common kinds of use– document-centric: ordinary text documents with markup – data-centric: representation and exchange of data with applications• XHTML– an example of document-centric view– XHTML is HTML with more stringent ruleseverything balanced and terminated and quoted; names are case sensitive<xhtml xmlns="http://www.w3.org/1999/xhtml"><head><title> This is a title </title></head><body bgcolor="white"><h1> A heading </h1><p> A paragraph of free-form<b><i>bold italic</i></b> text. </p><p> Another paragraph. </p></body></xhtml>XML as seen by browsersWhy XML?• increasing use of web services– too hard to extract semantics from HTML– closed and/or binary systems are too hard to work with, too inflexible• XML is open, non-proprietary• text-based– can see what it does– standard tools work on it– there are standard parsers, transformers, generators, etc.• simple, extensible– existing vocabularies for important areas– can define new vocabularies for specific areas• most XML use is data-centric– standard exchange format for web services– configuration info inside systemsXML vocabularies and namespaces• a vocabularyis an XML description for a specific domain–Schema–XHTML– RSS (really simple syndication)– SVG (scalable vector graphics)– MathML (mathematics)– SMIL (markup for multi-media presentations)– ...• namespaces– mechanism for handling name collisions between vocabularies<ns:tag> ... </ns:tag><ns2:tag> ... </ns2:tag>RSS: Really Simple SyndicationXML describes trees• "well formed": it is a valid tree structure– properly nested– syntactically correct– everything properly quoted– nothing about semantics or relationships among elements• "valid": well formed AND satisfies rules about what is legal• DTD: document type definition– (comparatively) simple pattern specification– not very powerful (no data types)– not written in XML syntax (needs separate tools)• Schema– (comparatively) complicated specification– much stronger language for expressing structure sequencing and counting of complex types– built-in basic types like integer, double, string– can attach validation constraints to basic typesranges of integers, patterns of strings, etc.– written in XML, can apply all XML tools to itExample schema (a small part)<?xml version="1.0" encoding="UTF-8"?><!--W3C Schema generated by XMLSPY --><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="amazon"><xs:complexType><xs:sequence><xs:element ref="book" maxOccurs="unbounded"/><xs:element ref="customer" maxOccurs="unbounded"/></xs:sequence><xs:attribute name="amazon" type="xs:string" use="required"/></xs:complexType></xs:element><xs:element name="book"><xs:complexType><xs:sequence><xs:element ref="title"/><xs:element ref="author" maxOccurs="unbounded"/><xs:element ref="list"/><xs:element ref="sale" maxOccurs="unbounded"/></xs:sequence><xs:attribute name="isbn" use="required"><xs:simpleType></xs:simpleType></xs:attribute></xs:complexType></xs:element>XML tools /XMLSpyXML processing by program• two basic kinds of parsers• DOM (Document Object Model)– read entire XML document into memory– create a tree– provide methods for walking/processing the tree• SAX (Simple API for XML)– read through XML documentnothing stored implicitly– call user-defined method for each document elementcallbacks• other processing tools– XSLT (extensible stylesheet language for XML transformations)– XPath (query/filter language for XML)– XQuery (query language for XMLDOM: document object model• standard "language-independent" interface for manipulating structured documents• allows dynamic access and modification• methods for traversing tree and accessing nodes– does not define any semantics other than walking the treeaccessing elementsadding or deleting elements• implementations in Java, C++, VB, etc.• not as language-independent as might appear– have to change a fair amount to change languagesDOM reader in Javaimport java.io.*;import org.w3c.dom.*;import javax.xml.parsers.*;public class domreader {public static void main(String[] args) {domreader r = new domreader(args[0]);}public domreader(String f) {try {DocumentBuilderFactory dbf =DocumentBuilderFactory.newInstance();// dbf.setValidating(true);DocumentBuilder b = dbf.newDocumentBuilder();Document doc =


View Full Document

Princeton COS 333 - XML and friends

Download XML and friends
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view XML and friends and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view XML and friends 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?