Duke CPS 116 - SAX & DOM - D2809533

Home> Schools> Duke University> (CPS) > CPS 116> SAX & DOM

DOC PREVIEW

Duke CPS 116 - SAX & DOM

School name Duke University

Course Cps 116- Introduction to Database Systems

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1SAX & DOMCPS 116Introduction to Database Systems2Announcements (October 27) Homework #3 due next Tuesday Project milestone #2 due Nov. 103SAX & DOM Both are API’s for XML processing SAX (Simple API for XML) Started out as a Java API, but now exists for other languages too  DOM (Document Object Model) Language-neutral API with implementations in Java, C++, etc.) JAXP (Java API for XML Processing) Bundled with standard JDK Includes SAX, DOM parsers and XSLT transformers24SAX processing model Serial access XML document is processed as a stream Only one look at the data Cannot go back to an early portion of the document Event-driven A parser generates events as it goes through the document (e.g., start of the document, end of an element, etc.) Application defines event handlers that get invoked when events are generated5SAX eventsMost frequently used events: startDocument endDocument startElement endElement characters Whenever the parser has processed a chunk of character data (without generating other kinds of events) Warning: The parser may generate multiple characters events for one piece of text <?xml version=“1.0”><bibliography><book ISBN=”ISBN-10” price=”80.00”><title>Foundations of Databases</title>…</book>…</bibliography>startElementstartDocumentstartElementendElementendElementendDocumentstartElementcharactersendElementWhitespace may come up as charactersor ignorableWhitespace, depending onwhether a DTD is present6A simple SAX example Print out text contents of title elementsimport java.io.*;import org.xml.sax.*;import org.xml.sax.helpers.DefaultHandler;import javax.xml.parsers.*;public class SaxExample extends DefaultHandler {public static void main(String[] argv) throws Exception {String fileName = argv[0];// Create a SAX parser:SAXParserFactory factory = SAXParserFactory.newInstance();SAXParser saxParser = factory.newSAXParser();// Parse the document with this event handler:DefaultHandler handler = new SaxExample();saxParser.parse(new File(fileName), handler);return;}… …37A simple SAX example (cont’d)private StringBuffer titleStringBuffer = null;public void startElement(String uri, String localName,String qName,Attributes attributes) {if (qName.equals(“title”))titleStringBuffer = new StringBuffer();}public void endElement(String uri, String localName,String qName) {if (qName.equals(“title”)) {System.out.println(titleStringBuffer.toString());titleStringBuffer = null;}}public void characters(char[] ch, int start, int length) {if (titleStringBuffer != null)titleStringBuffer.append(ch, start, length);}Warning: This code does not handle data with //title[//title] patternOnly relevant whennamespace is involvedAssuming no namespaceprocessing, qname is tag name8A common mistakeWhat is wrong with the following? private String titleString = null;public void endElement(String uri, String localName,String qName) {// Print the last chunk of characters seen before </title>:if (qName.equals(“title”))System.out.println(titleString);}public void characters(char[] ch, int start, int length) {titleString = new String(ch, start, length);}9A more complex SAX example Print out the text contents of top-level section titles in books, i.e., //book/section/title Old code would print out all titles, e.g., //book/title, //book//section/title For simplicity, assume that if we have the pattern //book/section/title//book/section/title, we print the higher-level title element Idea: maintain as state the path from the rootprivate ArrayList path = new ArrayList();private int pathLengthWhenOutputIsActivated;410A more complex SAX example (cont’d)public void startElement(String uri, String localName,String qName,Attributes attributes) {path.add(qName); // Maintain the path.if (path.size() >= 3 &&((String)(path.get(path.size()-1))).equals(“title”) &&((String)(path.get(path.size()-2))).equals(“section”) &&((String)(path.get(path.size()-3))).equals(“book”)) {// path matches //book/section/title:if (titleStringBuffer == null) {pathLengthWhenOutputIsActivated = path.size();titleStringBuffer = new StringBuffer();}}}11A more complex SAX example (cont’d)public void endElement(String uri, String localName,String qName) {if (titleStringBuffer != null &&path.size() == pathLengthWhenOutputIsActivated) {// Closing the element that activated output buffering:System.out.println(titleStringBuffer.toString());titleStringBuffer = null;}path.remove(path.size()-1); // Maintain the path.}public void characters(char[] ch, int start, int length) {if (titleStringBuffer != null)titleStringBuffer.append(ch, start, length);}This check prevents premature outputin case that title has subelementsWould it work if we change this check to qName.equals(“title”)?12DOM processing model XML is parsed by a parser and converted into an in-memory DOM tree DOM API allows an application to Construct a DOM tree from an XML document Traverse and read a DOM tree Construct a new, empty DOM tree from scratch Modify an existing DOM tree Copy subtrees from one DOM tree to antheretc.513DOM Node’s A DOM tree is made up of Node’s Most frequently used types of Node’s: Document: root of the DOM tree• Not the sames as the root element of XML DocumentType: corresponds to the DOCTYPE declaration in an XML document Element: corresponds to an XML element Attr: corresponds to an attribute of an XML element Text: corresponds to chunk of text14DOM example<?xml version=“1.0”><!DOCTYPE …><bibliography><book ISBN=”ISBN-10” price=”80.00”><title>Foundations of Databases</title><author>Abiteboul</author><author>Hull</author><author>Vianu</author>…</book><book ISBN=“ISBN-20” price=“40.00”>…</book>…</bibliography>DocumentDocumentTypeElementTextElement Attr AttrTextElementTextTextElementTextTextElementTextTextElementTextTextElement Attr AttrWhitespace between tags is also parsed as Text15Node interfacen.getNodeType() returns the type of Node nn.getChildNodes() returns a NodeList containing Node n’s children For example, subelements are children of an Element; DocumentType is a child of the Documentd.getDocumentElement() returns the root Element of Document de.getNodeName() returns the tag name of Element ee.getAttributes() returns a NamedNodeMap (hash table) containing the attributes of Element e Attributes are not considered children!a.getNodeName()

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Duke CPS 116 - SAX & DOM

Sign up for free to view:

Please select your school