SAX Jan 14 2019 SAX and DOM SAX and DOM are standards for XML parsers program APIs to read and interpret XML files There are various implementations available Java implementations are provided as part of JAXP Java API for XML Processing JAXP is included as a package in Java 1 4 DOM is a W3C standard SAX is an ad hoc but very popular standard SAX was developed by David Megginson and is open source JAXP is available separately for Java 1 3 Unlike many XML technologies SAX and DOM are relatively easy Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML document and calls one of your methods for each element or block of text that it encounters Consequences DOM provides random access into the XML document SAX provides only sequential access to the XML document DOM is slow and requires huge amounts of memory so it cannot be used for large XML documents SAX is fast and requires very little memory so it can be used for huge documents or large numbers of documents This makes SAX much more popular for web sites Some DOM implementations have methods for changing the XML document in memory SAX implementations do not Callbacks SAX works through callbacks you call the parser it calls methods that you supply Your program startDocument main The SAX parser parse startElement characters endElement endDocument Simple SAX program The following program is adapted from CodeNotes for XML by Gregory Brill pages 158 159 The program consists of two classes Sample This class contains the main method it Gets a factory to make parsers Gets a parser from the factory Creates a Handler object to handle callbacks from the parser Tells the parser which handler to send its callbacks to Reads and parses the input XML file Handler This class contains handlers for three kinds of callbacks startElement callbacks generated when a start tag is seen endElement callbacks generated when an end tag is seen characters callbacks generated for the contents of an element The Sample class I import javax xml parsers for both SAX and DOM import org xml sax import org xml sax helpers For simplicity we let the operating system handle exceptions In real life this is poor programming practice public class Sample public static void main String args throws Exception Create a parser factory SAXParserFactory factory SAXParserFactory newInstance Tell factory that the parser must understand namespaces factory setNamespaceAware true Make the parser SAXParser saxParser factory newSAXParser XMLReader parser saxParser getXMLReader The Sample class II In the previous slide we made a parser of type XMLReader Create a handler Handler handler new Handler Tell the parser to use this handler parser setContentHandler handler Finally read and parse the document parser parse hello xml end of Sample class You will need to put the file hello xml In the same directory if you run the program from the command line Or where it can be found by the particular IDE you are using The Handler class I public class Handler extends DefaultHandler DefaultHandler is an adapter class that defines these methods and others as do nothing methods to be overridden as desired We will define three very similar methods to handle 1 start tags 2 contents and 3 end tags our methods will just print a line Each of these three methods could throw a SAXException SAX calls this method when it encounters a start tag public void startElement String namespaceURI String localName String qualifiedName Attributes attributes throws SAXException System out println startElement qualifiedName The Handler class II SAX calls this method to pass in character data public void characters char ch int start int length throws SAXException System out println characters new String ch start length SAX call this method when it encounters an end tag public void endElement String namespaceURI String localName String qualifiedName throws SAXException System out println Element qualifiedName End of Handler class Results If the file hello xml contains xml version 1 0 display Hello World display Then the output from running java Sample will be startElement display characters Hello World Element display More results Now suppose the file hello xml contains xml version 1 0 display i Hello i World display Notice that the root element display now contains a nested element i and some whitespace including newlines The result will be as shown at the right startElement display empty characters characters string newline characters startElement i spaces characters Hello endElement i characters World characters another newline endElement display Parser factories A factory is an alternative to constructors To create a SAX parser factory call this method SAXParserFactory newInstance This returns an object of type SAXParserFactory It may throw a FactoryConfigurationError You can then say what kind of parser you want public void setNamespaceAware boolean awareness Call this with true if you are using namespaces The default if you don t call this method is false public void setValidating boolean validating Call this with true if you want to validate against a DTD The default if you don t call this method is false Validation will give an error if you don t have a DTD Getting a parser Once you have a SAXParserFactory set up say it s named factory you can create a parser with SAXParser saxParser factory newSAXParser XMLReader parser saxParser getXMLReader Note older texts may use Parser in place of XMLReader Parser is SAX1 not SAX2 and is now deprecated SAX2 supports namespaces and some new parser properties Note SAXParser is not thread safe to use it in multiple threads create a separate SAXParser for each thread This is unlikely to be a problem in class projects Declaring which handler to use Since the SAX parser will be calling our methods we need to supply these methods In the example these are in a separate class Handler We need to tell the parser where to find the methods Handler handler new Handler parser setContentHandler handler These statements could be combined parser setContentHandler new Handler Finally we call the parser and tell it what file to parse parser parse hello xml Everything else will be done in the handler methods SAX handlers A callback handler for SAX must implement these four interfaces interface ContentHandler interface DTDHandler Does customized handling for external entities interface ErrorHandler Handles only notation
View Full Document