1SAX & DOMCPS 116Introduction to Database Systems2Announcements (October 27) Homework #3 due next Tuesday Project milestone #2 due Nov. 103SAX & DOM Both are API’s for XML processing SAX (Simple API for XML) Started out as a Java API, but now exists for other languages too DOM (Document Object Model) Language-neutral API with implementations in Java, C++, etc.) JAXP (Java API for XML Processing) Bundled with standard JDK Includes SAX, DOM parsers and XSLT transformers24SAX processing model Serial access XML document is processed as a stream Only one look at the data Cannot go back to an early portion of the document Event-driven A parser generates events as it goes through the document (e.g., start of the document, end of an element, etc.) Application defines event handlers that get invoked when events are generated5SAX eventsMost frequently used events: startDocument endDocument startElement endElement characters Whenever the parser has processed a chunk of character data (without generating other kinds of events) Warning: The parser may generate multiple characters events for one piece of text <?xml version=“1.0”><bibliography><book ISBN=”ISBN-10” price=”80.00”><title>Foundations of Databases</title>…</book>…</bibliography>startElementstartDocumentstartElementendElementendElementendDocumentstartElementcharactersendElementWhitespace may come up as charactersor ignorableWhitespace, depending onwhether a DTD is present6A simple SAX example Print out text contents of title elementsimport java.io.*;import org.xml.sax.*;import org.xml.sax.helpers.DefaultHandler;import javax.xml.parsers.*;public class SaxExample extends DefaultHandler {public static void main(String[] argv) throws Exception {String fileName = argv[0];// Create a SAX parser:SAXParserFactory factory = SAXParserFactory.newInstance();SAXParser saxParser = factory.newSAXParser();// Parse the document with this event handler:DefaultHandler handler = new SaxExample();saxParser.parse(new File(fileName), handler);return;}… …37A simple SAX example (cont’d)private StringBuffer titleStringBuffer = null;public void startElement(String uri, String localName,String qName,Attributes attributes) {if (qName.equals(“title”))titleStringBuffer = new StringBuffer();}public void endElement(String uri, String localName,String qName) {if (qName.equals(“title”)) {System.out.println(titleStringBuffer.toString());titleStringBuffer = null;}}public void characters(char[] ch, int start, int length) {if (titleStringBuffer != null)titleStringBuffer.append(ch, start, length);}Warning: This code does not handle data with //title[//title] patternOnly relevant whennamespace is involvedAssuming no namespaceprocessing, qname is tag name8A common mistakeWhat is wrong with the following? private String titleString = null;public void endElement(String uri, String localName,String qName) {// Print the last chunk of characters seen before </title>:if (qName.equals(“title”))System.out.println(titleString);}public void characters(char[] ch, int start, int length) {titleString = new String(ch, start, length);}9A more complex SAX example Print out the text contents of top-level section titles in books, i.e., //book/section/title Old code would print out all titles, e.g., //book/title, //book//section/title For simplicity, assume that if we have the pattern //book/section/title//book/section/title, we print the higher-level title element Idea: maintain as state the path from the rootprivate ArrayList path = new ArrayList();private int pathLengthWhenOutputIsActivated;410A more complex SAX example (cont’d)public void startElement(String uri, String localName,String qName,Attributes attributes) {path.add(qName); // Maintain the path.if (path.size() >= 3 &&((String)(path.get(path.size()-1))).equals(“title”) &&((String)(path.get(path.size()-2))).equals(“section”) &&((String)(path.get(path.size()-3))).equals(“book”)) {// path matches //book/section/title:if (titleStringBuffer == null) {pathLengthWhenOutputIsActivated = path.size();titleStringBuffer = new StringBuffer();}}}11A more complex SAX example (cont’d)public void endElement(String uri, String localName,String qName) {if (titleStringBuffer != null &&path.size() == pathLengthWhenOutputIsActivated) {// Closing the element that activated output buffering:System.out.println(titleStringBuffer.toString());titleStringBuffer = null;}path.remove(path.size()-1); // Maintain the path.}public void characters(char[] ch, int start, int length) {if (titleStringBuffer != null)titleStringBuffer.append(ch, start, length);}This check prevents premature outputin case that title has subelementsWould it work if we change this check to qName.equals(“title”)?12DOM processing model XML is parsed by a parser and converted into an in-memory DOM tree DOM API allows an application to Construct a DOM tree from an XML document Traverse and read a DOM tree Construct a new, empty DOM tree from scratch Modify an existing DOM tree Copy subtrees from one DOM tree to antheretc.513DOM Node’s A DOM tree is made up of Node’s Most frequently used types of Node’s: Document: root of the DOM tree• Not the sames as the root element of XML DocumentType: corresponds to the DOCTYPE declaration in an XML document Element: corresponds to an XML element Attr: corresponds to an attribute of an XML element Text: corresponds to chunk of text14DOM example<?xml version=“1.0”><!DOCTYPE …><bibliography><book ISBN=”ISBN-10” price=”80.00”><title>Foundations of Databases</title><author>Abiteboul</author><author>Hull</author><author>Vianu</author>…</book><book ISBN=“ISBN-20” price=“40.00”>…</book>…</bibliography>DocumentDocumentTypeElementTextElement Attr AttrTextElementTextTextElementTextTextElementTextTextElementTextTextElement Attr AttrWhitespace between tags is also parsed as Text15Node interfacen.getNodeType() returns the type of Node nn.getChildNodes() returns a NodeList containing Node n’s children For example, subelements are children of an Element; DocumentType is a child of the Documentd.getDocumentElement() returns the root Element of Document de.getNodeName() returns the tag name of Element ee.getAttributes() returns a NamedNodeMap (hash table) containing the attributes of Element e Attributes are not considered children!a.getNodeName()
View Full Document