Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

Receiving Elements

In a very real sense, SAX reports tags, not elements. When the parser encounters a start-tag, it calls the startElement() method. When the parser encounters an end-tag, it calls the endElement() method. When the parser encounters an empty-element tag, it calls the startElement() method and then the endElement() method.

Caution

Parsers and ContentHandlers are not thread safe or reentrant. Whereas it's straightforward to design a SAX program that operates on multiple documents in series, it is almost impossible to design one that operates on multiple documents in parallel. If you need to perform XML parsing in multiple, simultaneous threads, give each thread its own XMLReader and ContentHandler objects. Similarly, if you want to parse another document from inside one of the ContentHandler methods, create a new XMLReader and a new ContentHandler object to parse it with. Do not try to reuse the existing XMLReader and ContentHandler before they've finished with the current document.


If an end-tag does not match its corresponding start-tag, then the parser throws a SAXParseException. Beyond that, however, you are responsible for tracking the hierarchy. For example, if you want to treat a params element inside a methodCall element differently from a params element inside a fault element, then you'll need to store some form of state in-between calls to the startElement() and endElement() methods. This is actually quite common. Many SAX content handlers simply build up a data structure as the document is parsed, and then operate on that data structure once the document has been read completely. Provided that the data structure is simpler than the XML document itself, this is a reasonable approach. However, in the most general case you can find yourself inventing a complete object hierarchy to represent arbitrary XML documents. In this case, you're better off using DOM or JDOM instead of SAX, because they'll do the hard work of defining and building this object hierarchy for you.

The arguments to the startElement() and endElement() methods are similar:

public void startElement (String namespaceURI, String localName,
 String qualifiedName, Attributes atts) throws SAXException

public void endElement (String namespaceURI, String localName,
 String qualifiedName) throws SAXException

The sequence is as follows:

  1. The namespace URI is passed as a String. If the element is unqualified (that is, if it is not in a namespace), then this argument is the empty string, not null.

  2. The local name is passed as a String. This is the part of the name after the prefix and the colon, if any. For example, whether an element is named SOAP-ENV:Body or Body with no prefix, its local name is Body.

  3. The third argument contains the qualified name as a String. This is the entire element name including the prefix and the colon, if any. For example, if an element is named SOAP-ENV:Body, then its qualified name is SOAP-ENV:Body. However, if an element is named Body with no prefix, then its qualified name is just Body.

  4. Finally in the startElement() method only, the set of attributes for that element is passed as a SAX-specific Attributes object. I'll discuss this in the next section.

As an example I'm going to build a GUI representation of the tree structure of an XML document that allows you to collapse and expand the individual elements. The GUI parts will be provided by a javax.swing.JTree. The tree will be filled in startElement() and displayed in a window in endDocument(). Example 6.7 shows how.

Example 6.7 A ContentHandler Class That Builds a GUI Representation of an XML Document
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.swing.*;
import javax.swing.tree.*;
import java.util.*;


public class TreeViewer extends DefaultHandler {

  private Stack nodes;

  // Initialize the per-document data structures
  public void startDocument() throws SAXException {

    // The stack needs to be reinitialized for each document
    // because an exception might have interrupted parsing of a
    // previous document, leaving an unempty stack.
    nodes = new Stack();

  }

  // Make sure we always have the root element
  private TreeNode root;

  // Initialize the per-element data structures
  public void startElement(String namespaceURI, String localName,
   String qualifiedName, Attributes atts) {

    String data;
    if (namespaceURI.equals("")) data = localName;
    else {
      data = '{' + namespaceURI + "}" + qualifiedName;
    }
    MutableTreeNode node = new DefaultMutableTreeNode(data);
    try {
      MutableTreeNode parent = (MutableTreeNode) nodes.peek();
      parent.insert(node, parent.getChildCount());
    }
    catch (EmptyStackException e) {
      root = node;
    }
    nodes.push(node);

  }

  public void endElement(String namespaceURI, String localName,
   String qualifiedName) {
    nodes.pop();
  }

  // Flush and commit the per-document data structures
  public void endDocument() {

    JTree tree           = new JTree(root);
    JScrollPane treeView = new JScrollPane(tree);
    JFrame f             = new JFrame("XML Tree");

    f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    f.getContentPane().add(treeView);
    f.pack();
    f.show();

  }


  public static void main(String[] args) {

  try {
      XMLReader parser = XMLReaderFactory.createXMLReader(
        "org.apache.xerces.parsers.SAXParser"
      );
      ContentHandler handler = new TreeViewer();
      parser.setContentHandler(handler);
      for (int i = 0; i < args.length; i++) {
        parser.parse(args[i]);
      }
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }  // end main()

}// end TreeViewer

The JTree class provides a ready-made data structure for this program. We just have to fill it. In doing so, we need to track where we are in the XML hierarchy at all times so that the parent to which the current node will be added is accessible. For this purpose a stack is very helpful. The parent element can be pushed onto the stack in startElement() and popped off the stack in endElement(). Because SAX's beginning-to-end parsing of an XML document equates to a depth-first tree traversal, the top element in the stack always contains the most recently visited element.

I find stacks like this to be very useful in many SAX programs. More complex programs may need to build more complicated tree or object structures. If your purpose is not simply to display a GUI for the tree, then you should probably roll your own tree structure rather than using JTree as I've done here.

Note

TreeViewer runs with the default distribution of Java 1.2 and later. It can run with Java 1.1, but you'll need to make sure the swingall.jar archive is somewhere in your class path. The javax.swing classes used here are not bundled with the JDK 1.1.


Figure 6.1 shows this program displaying Example 1.7 from Chapter 1. Swing allows individual parts of the tree to be collapsed or expanded, but the entire element tree is always present even if it's hidden. JTree also allows you to customize the icons used, and even enable the user to edit the tree. But that's purely Swing programming and says little to nothing about XML, so I'll leave that as an exercise for the reader.

Figure 6.1. The Swing-Based TreeViewer

graphics/06fig01.gif

Caution

This makes a nice little example, but please don't regard it as more than that. The tantalizing ease of representing XML documents with widgets like java.swing.JTree and similar features in Windows, Motif, and other GUIs has spawned a lot of editors and browsers that use these tree models as user interfaces. However, not a lot of thought went into whether users actually thought of XML documents this way or could be quickly trained to do so.

In actual practice, user interfaces of this sort have failed spectacularly. A good user interface for XML editors and viewers more closely resembles the user interfaces people are accustomed to from traditional programs, such as Microsoft Word, Netscape Navigator, and Adobe Illustrator. The whole point of a GUI is that it can decouple the user interface from the underlying data model. Just because an XML document is a tree is no excuse for making users edit trees when they don't want to.


    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    What Is SAX?
    Parsing
    Callback Interfaces
    Receiving Documents
    Receiving Elements
    Handling Attributes
    Receiving Characters
    Receiving Processing Instructions
    Receiving Namespace Mappings
    'Ignorable White Space'
    Receiving Skipped Entities
    Receiving Locators
    What the ContentHandler Doesn't Tell You
    Summary
    Chapter 7. The XMLReader Interface
    Chapter 8. SAX Filters
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele