DOMSAX and DOMDifference between SAX and DOMSimple DOM program, ISimple DOM program, IISimple DOM program, IIISimple DOM program, IVReading in the treeStructure of the DOM treeOperations on Nodes, IDistinguishing Node typesOperations on Nodes, IIOperations for ElementsNamedNodeMapOperations on TextsOperations on AttrsPreorder traversalPreorder traversal in JavaTrying out the programAdditional DOM operationsThe EndDOMSAX and DOM•SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files–DOM is a W3C standard–SAX is an ad-hoc (but very popular) standard•There are various implementations available•Java implementations are provided in JAXP (Java API for XML Processing)•JAXP is included as a package in Java 1.4–JAXP is available separately for Java 1.3•Unlike many XML technologies, SAX and DOM are relatively easyDifference between SAX and DOM•DOM reads the entire XML document into memory and stores it as a tree data structure•SAX reads the XML document and sends an event for each element that it encounters•Consequences:–DOM provides “random access” into the XML document–SAX provides only sequential access to the XML document–DOM is slow and requires huge amounts of memory, so it cannot be used for large XML documents–SAX is fast and requires very little memory, so it can be used for huge documents (or large numbers of documents)•This makes SAX much more popular for web sites–Some DOM implementations have methods for changing the XML document in memory; SAX implementations do notSimple DOM program, I•This program is adapted from CodeNotes® for XML by Gregory Brill, page 128•import javax.xml.parsers.*;import org.w3c.dom.*;•public class SecondDom { public static void main(String args[]) { try { ...Main part of program goes here... } catch (Exception e) { e.printStackTrace(System.out); } }}Simple DOM program, II•First we need to create a DOM parser, called a “DocumentBuilder”•The parser is created, not by a constructor, but by calling a static factory method–This is a common technique in advanced Java programming–The use of a factory method makes it easier if you later switch to a different parser DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory = newDocumentBuilder();Simple DOM program, III•The next step is to load in the XML file•Here is the XML file, named hello.xml: <?xml version="1.0"?> <display>Hello World!</display>•To read this file in, we add the following line to our program: Document document = builder.parse("hello.xml");•Notes:–document contains the entire XML file (as a tree); it is the Document Object Model–If you run this from the command line, your XML file should be in the same directory as your program–An IDE may look in a different directory for your file; if you get a java.io.FileNotFoundException, this is probably whySimple DOM program, IV•The following code finds the content of the root element and prints it: Element root = document.getDocumentElement(); Node textNode = root.getFirstChild(); System.out.println(textNode.getNodeValue());•This code should be mostly self-explanatory; we’ll get into the details shortly•The output of the program is: Hello World!Reading in the tree•The parse method reads in the entire XML document and represents it as a tree in memory–For a large document, parsing could take a while–If you want to interact with your program while it is parsing, you need to parse in a separate thread•Once parsing starts, you cannot interrupt or stop it•Do not try to access the parse tree until parsing is done•An XML parse tree may require up to ten times as much memory as the original XML document–If you have a lot of tree manipulation to do, DOM is much more convenient than SAX–If you don’t have a lot of tree manipulation to do, consider using SAX insteadStructure of the DOM tree•The DOM tree is composed of Node objects•Node is an interface–Some of the more important subinterfaces are Element, Attr, and Text•An Element node may have children•Attr and Text nodes are leaves–Additional types are Document, ProcessingInstruction, Comment, Entity, CDATASection and several others•Hence, the DOM tree is composed entirely of Node objects, but the Node objects can be downcast into more specific types as neededOperations on Nodes, I•The results returned by getNodeName(), getNodeValue(), getNodeType() and getAttributes() depend on the subtype of the node, as follows: Element Text AttrgetNodeName()getNodeValue()getNodeType()getAttributes()tag namenullELEMENT_NODENamedNodeMap"#text"text contents TEXT_NODEnull name of attribute value of attribute ATTRIBUTE_NODEnullDistinguishing Node types•Here’s an easy way to tell what kind of a node you are dealing with: switch(node.getNodeType()) { case Node.ELEMENT_NODE: Element element = (Element)node;...;break; case Node.TEXT_NODE: Text text = (Text)node;...break; case Node.ATTRIBUTE_NODE: Attr attr = (Attr)node;...break; default: ... }Operations on Nodes, II•Tree-walking operations that return a Node:–getParentNode()–getFirstChild()–getNextSibling()–getPreviousSibling()–getLastChild()•Tests that return a boolean:–hasAttributes()–hasChildNodes()Operations for Elements•String getTagName()–Returns true if this Element has the named attribute •boolean hasAttribute(String name)–Returns true if this Element has the named attribute•String getAttribute(String name)–Returns the (String) value of the named attribute•boolean hasAttributes()–Returns true if this Element has any attributes–This method is actually inherited from Node•Returns false if it is applied to a Node that isn’t an Element •NamedNodeMap getAttributes()–Returns a NamedNodeMap of all the Element’s attributes–This method is actually inherited from Node•Returns null if it is applied to a Node that isn’t an ElementNamedNodeMap•The node.getAttributes() operation returns a NamedNodeMap–Because NamedNodeMaps are used for other kinds of nodes (elsewhere in Java), the contents are treated as general Nodes, not specifically as Attrs •Some operations on a NamedNodeMap are:–getNamedItem(String name) returns (as a Node) the attribute with the given name–getLength() returns (as an int) the number of Nodes in this NamedNodeMap –item(int index) returns (as a Node) the indexth item•This
View Full Document